All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCHv2 00/11] iscsi/qemu-img/block-migration enhancements
@ 2013-06-27 13:11 Peter Lieven
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 01/11] iscsi: add logical block provisioning information to iscsilun Peter Lieven
                   ` (10 more replies)
  0 siblings, 11 replies; 69+ messages in thread
From: Peter Lieven @ 2013-06-27 13:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Peter Lieven, ronniesahlberg, stefanha

this series adds logical block provisioning functions to the iscsi layer.
it adds the ability to write sparse images to iscsi targets which
support UNMAP and LBPRZ via qemu-img. additional block-migration is
enhanced to support an encoding for zero blocks on the network
and efficiently writing those zero blocks at the destination if
the driver supports it.

many thanks to Paolo, Kevin and Ronnie for their useful comments.

changes in v2:
  - added patch to read unmap limits from block limits VPD page
  - use a generic callback and new framework for coroutines that
    can also be used for further porting of the remaining aio
    routines. iscsi_co_{write_zeroes,is_allocated} use this new
    framework. commands are now also retried on check conditions.
  - iscsi_co_write_zeroes checks now for BDRV_O_UNMAP and
    if the request is not too big for a single UNMAP call.
  - added a patch to let iscsi_create unmap the whole device
    if thin provisioning is supported and the unmapped blocks
    read as zero. in this case return .has_zero_init = 1.
  - added a patch to let aio_discard silently fail if UNMAP
    is not supported or the request is too big.
  - factored out almost all conversions between qemu and lun
    sector size.
  - changed the alignment checks from assert() to failing
    the request.
  - dropped the qemu-img patch that called out to bdrv_write_zeroes.

Peter


Peter Lieven (11):
  iscsi: add logical block provisioning information to iscsilun
  iscsi: read unmap info from block limits vpd page
  iscsi: add bdrv_co_is_allocated
  iscsi: add bdrv_co_write_zeroes
  block: add bdrv_write_zeroes()
  block/raw: add bdrv_co_write_zeroes
  iscsi: let bdrv_create conditionally zero out the device
  block-migration: efficiently encode zero blocks
  iscsi: factor out sector conversions
  iscsi: ignore aio_discard if unsupported
  iscsi: assert that sectors are aligned to LUN blocksize

 block-migration.c             |   29 +++-
 block.c                       |   27 ++-
 block/iscsi.c                 |  365 ++++++++++++++++++++++++++++++++++++++++-
 block/raw.c                   |    8 +
 include/block/block.h         |    2 +
 include/migration/qemu-file.h |    1 +
 savevm.c                      |    2 +-
 7 files changed, 414 insertions(+), 20 deletions(-)

-- 
1.7.9.5

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCHv2 01/11] iscsi: add logical block provisioning information to iscsilun
  2013-06-27 13:11 [Qemu-devel] [PATCHv2 00/11] iscsi/qemu-img/block-migration enhancements Peter Lieven
@ 2013-06-27 13:11 ` Peter Lieven
  2013-07-01 13:35   ` Stefan Hajnoczi
  2013-07-10  9:19   ` Kevin Wolf
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 02/11] iscsi: read unmap info from block limits vpd page Peter Lieven
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 69+ messages in thread
From: Peter Lieven @ 2013-06-27 13:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Peter Lieven, ronniesahlberg, stefanha

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/iscsi.c |   47 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/block/iscsi.c b/block/iscsi.c
index 0bbf0b1..a38a1bf 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -49,6 +49,11 @@ typedef struct IscsiLun {
     uint64_t num_blocks;
     int events;
     QEMUTimer *nop_timer;
+    uint8_t lbpme;
+    uint8_t lbprz;
+    uint8_t lbpu;
+    uint8_t lbpws;
+    uint8_t lbpws10;
 } IscsiLun;
 
 typedef struct IscsiAIOCB {
@@ -948,6 +953,8 @@ static int iscsi_readcapacity_sync(IscsiLun *iscsilun)
                 } else {
                     iscsilun->block_size = rc16->block_length;
                     iscsilun->num_blocks = rc16->returned_lba + 1;
+                    iscsilun->lbpme = rc16->lbpme;
+                    iscsilun->lbprz = rc16->lbprz;
                 }
             }
             break;
@@ -1130,6 +1137,46 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags)
         bs->sg = 1;
     }
 
+    if (iscsilun->lbpme) {
+        struct scsi_inquiry_logical_block_provisioning *inq_lbp;
+        int full_size;
+
+        task = iscsi_inquiry_sync(iscsi, iscsilun->lun, 1,
+                                  SCSI_INQUIRY_PAGECODE_LOGICAL_BLOCK_PROVISIONING,
+                                  64);
+        if (task == NULL || task->status != SCSI_STATUS_GOOD) {
+            error_report("iSCSI: Inquiry command failed : %s",
+                   iscsi_get_error(iscsilun->iscsi));
+            ret = -EINVAL;
+            goto out;
+        }
+        full_size = scsi_datain_getfullsize(task);
+        if (full_size > task->datain.size) {
+            scsi_free_scsi_task(task);
+
+            /* we need more data for the full list */
+            task = iscsi_inquiry_sync(iscsi, iscsilun->lun, 1,
+                                      SCSI_INQUIRY_PAGECODE_LOGICAL_BLOCK_PROVISIONING,
+                                      full_size);
+            if (task == NULL || task->status != SCSI_STATUS_GOOD) {
+                error_report("iSCSI: Inquiry command failed : %s",
+                             iscsi_get_error(iscsilun->iscsi));
+                ret = -EINVAL;
+                goto out;
+            }
+        }
+
+        inq_lbp = scsi_datain_unmarshall(task);
+        if (inq_lbp == NULL) {
+            error_report("iSCSI: failed to unmarshall inquiry datain blob");
+            ret = -EINVAL;
+            goto out;
+        }
+        iscsilun->lbpu = inq_lbp->lbpu;
+        iscsilun->lbpws = inq_lbp->lbpws;
+        iscsilun->lbpws10 = inq_lbp->lbpws10;
+    }
+
 #if defined(LIBISCSI_FEATURE_NOP_COUNTER)
     /* Set up a timer for sending out iSCSI NOPs */
     iscsilun->nop_timer = qemu_new_timer_ms(rt_clock, iscsi_nop_timed_event, iscsilun);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCHv2 02/11] iscsi: read unmap info from block limits vpd page
  2013-06-27 13:11 [Qemu-devel] [PATCHv2 00/11] iscsi/qemu-img/block-migration enhancements Peter Lieven
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 01/11] iscsi: add logical block provisioning information to iscsilun Peter Lieven
@ 2013-06-27 13:11 ` Peter Lieven
  2013-07-03  3:43   ` ronnie sahlberg
                     ` (2 more replies)
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 03/11] iscsi: add bdrv_co_is_allocated Peter Lieven
                   ` (8 subsequent siblings)
  10 siblings, 3 replies; 69+ messages in thread
From: Peter Lieven @ 2013-06-27 13:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Peter Lieven, ronniesahlberg, stefanha

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 block/iscsi.c |   80 ++++++++++++++++++++++++++++++++++++++++-----------------
 1 file changed, 56 insertions(+), 24 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index a38a1bf..2e2455d 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -54,6 +54,8 @@ typedef struct IscsiLun {
     uint8_t lbpu;
     uint8_t lbpws;
     uint8_t lbpws10;
+    uint32_t max_unmap;
+    uint32_t max_unmap_bdc;
 } IscsiLun;
 
 typedef struct IscsiAIOCB {
@@ -1007,6 +1009,37 @@ static QemuOptsList runtime_opts = {
     },
 };
 
+static struct scsi_task *iscsi_do_inquiry(struct iscsi_context *iscsi,
+                                          int lun, int evpd, int pc) {
+        int full_size;
+        struct scsi_task *task = NULL;
+        task = iscsi_inquiry_sync(iscsi, lun, evpd, pc, 64);
+        if (task == NULL || task->status != SCSI_STATUS_GOOD) {
+            goto fail;
+        }
+        full_size = scsi_datain_getfullsize(task);
+        if (full_size > task->datain.size) {
+            scsi_free_scsi_task(task);
+
+            /* we need more data for the full list */
+            task = iscsi_inquiry_sync(iscsi, lun, evpd, pc, full_size);
+            if (task == NULL || task->status != SCSI_STATUS_GOOD) {
+                goto fail;
+            }
+        }
+
+        return task;
+
+fail:
+        error_report("iSCSI: Inquiry command failed : %s",
+                     iscsi_get_error(iscsi));
+        if (task) {
+            scsi_free_scsi_task(task);
+            return NULL;
+        }
+        return NULL;
+}
+
 /*
  * We support iscsi url's on the form
  * iscsi://[<username>%<password>@]<host>[:<port>]/<targetname>/<lun>
@@ -1139,33 +1172,12 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags)
 
     if (iscsilun->lbpme) {
         struct scsi_inquiry_logical_block_provisioning *inq_lbp;
-        int full_size;
-
-        task = iscsi_inquiry_sync(iscsi, iscsilun->lun, 1,
-                                  SCSI_INQUIRY_PAGECODE_LOGICAL_BLOCK_PROVISIONING,
-                                  64);
-        if (task == NULL || task->status != SCSI_STATUS_GOOD) {
-            error_report("iSCSI: Inquiry command failed : %s",
-                   iscsi_get_error(iscsilun->iscsi));
+        task = iscsi_do_inquiry(iscsilun->iscsi, iscsilun->lun, 1,
+                                SCSI_INQUIRY_PAGECODE_LOGICAL_BLOCK_PROVISIONING);
+        if (task == NULL) {
             ret = -EINVAL;
             goto out;
         }
-        full_size = scsi_datain_getfullsize(task);
-        if (full_size > task->datain.size) {
-            scsi_free_scsi_task(task);
-
-            /* we need more data for the full list */
-            task = iscsi_inquiry_sync(iscsi, iscsilun->lun, 1,
-                                      SCSI_INQUIRY_PAGECODE_LOGICAL_BLOCK_PROVISIONING,
-                                      full_size);
-            if (task == NULL || task->status != SCSI_STATUS_GOOD) {
-                error_report("iSCSI: Inquiry command failed : %s",
-                             iscsi_get_error(iscsilun->iscsi));
-                ret = -EINVAL;
-                goto out;
-            }
-        }
-
         inq_lbp = scsi_datain_unmarshall(task);
         if (inq_lbp == NULL) {
             error_report("iSCSI: failed to unmarshall inquiry datain blob");
@@ -1175,6 +1187,26 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags)
         iscsilun->lbpu = inq_lbp->lbpu;
         iscsilun->lbpws = inq_lbp->lbpws;
         iscsilun->lbpws10 = inq_lbp->lbpws10;
+        scsi_free_scsi_task(task);
+        task = NULL;
+    }
+
+    if (iscsilun->lbpu) {
+        struct scsi_inquiry_block_limits *inq_bl;
+        task = iscsi_do_inquiry(iscsilun->iscsi, iscsilun->lun, 1,
+                                SCSI_INQUIRY_PAGECODE_BLOCK_LIMITS);
+        if (task == NULL) {
+            ret = -EINVAL;
+            goto out;
+        }
+        inq_bl = scsi_datain_unmarshall(task);
+        if (inq_bl == NULL) {
+            error_report("iSCSI: failed to unmarshall inquiry datain blob");
+            ret = -EINVAL;
+            goto out;
+        }
+        iscsilun->max_unmap = inq_bl->max_unmap;
+        iscsilun->max_unmap_bdc = inq_bl->max_unmap_bdc;
     }
 
 #if defined(LIBISCSI_FEATURE_NOP_COUNTER)
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCHv2 03/11] iscsi: add bdrv_co_is_allocated
  2013-06-27 13:11 [Qemu-devel] [PATCHv2 00/11] iscsi/qemu-img/block-migration enhancements Peter Lieven
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 01/11] iscsi: add logical block provisioning information to iscsilun Peter Lieven
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 02/11] iscsi: read unmap info from block limits vpd page Peter Lieven
@ 2013-06-27 13:11 ` Peter Lieven
  2013-07-01 13:46   ` Stefan Hajnoczi
  2013-07-10  9:41   ` Kevin Wolf
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 04/11] iscsi: add bdrv_co_write_zeroes Peter Lieven
                   ` (7 subsequent siblings)
  10 siblings, 2 replies; 69+ messages in thread
From: Peter Lieven @ 2013-06-27 13:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Peter Lieven, ronniesahlberg, stefanha

this patch adds a coroutine for .bdrv_co_is_allocated as well as
a generic framework that can be used to build coroutines in block/iscsi.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 block/iscsi.c |  123 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 123 insertions(+)

diff --git a/block/iscsi.c b/block/iscsi.c
index 2e2455d..d31ee95 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -58,6 +58,15 @@ typedef struct IscsiLun {
     uint32_t max_unmap_bdc;
 } IscsiLun;
 
+typedef struct IscsiTask {
+    int status;
+    int complete;
+    int retries;
+    int do_retry;
+    struct scsi_task *task;
+    Coroutine *co;
+} IscsiTask;
+
 typedef struct IscsiAIOCB {
     BlockDriverAIOCB common;
     QEMUIOVector *qiov;
@@ -113,6 +122,45 @@ iscsi_schedule_bh(IscsiAIOCB *acb)
     qemu_bh_schedule(acb->bh);
 }
 
+static void
+iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
+                        void *command_data, void *opaque)
+{
+    struct IscsiTask *iTask = opaque;
+    struct scsi_task *task = command_data;
+
+    iTask->complete = 1;
+    iTask->status = status;
+    iTask->do_retry = 0;
+    iTask->task = task;
+
+    if (iTask->retries-- > 0 && status == SCSI_STATUS_CHECK_CONDITION
+        && task->sense.key == SCSI_SENSE_UNIT_ATTENTION) {
+        iTask->do_retry = 1;
+        goto out;
+    }
+
+    if (status != SCSI_STATUS_GOOD) {
+        error_report("iSCSI: Failure. %s", iscsi_get_error(iscsi));
+        goto out;
+    }
+
+out:
+    if (iTask->status != SCSI_STATUS_GOOD) {
+        scsi_free_scsi_task(iTask->task);
+        iTask->task = NULL;
+    }
+    if (iTask->co) {
+        qemu_coroutine_enter(iTask->co, NULL);
+    }
+}
+
+static void iscsi_co_init_iscsitask(IscsiLun *iscsilun, struct IscsiTask *iTask)
+{
+    memset(iTask, 0x00, sizeof(struct IscsiTask));
+    iTask->co = qemu_coroutine_self();
+    iTask->retries = ISCSI_CMD_RETRIES;
+}
 
 static void
 iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void *command_data,
@@ -807,6 +855,79 @@ iscsi_getlength(BlockDriverState *bs)
     return len;
 }
 
+static int coroutine_fn iscsi_co_is_allocated(BlockDriverState *bs,
+                                              int64_t sector_num,
+                                              int nb_sectors, int *pnum)
+{
+    IscsiLun *iscsilun = bs->opaque;
+    struct scsi_get_lba_status *lbas = NULL;
+    struct scsi_lba_status_descriptor *lbasd = NULL;
+    struct IscsiTask iTask;
+    int ret;
+
+    /* LUN does not support logical block provisioning */
+    if (iscsilun->lbpme == 0) {
+        *pnum = nb_sectors;
+        return 1;
+    }
+
+    iscsi_co_init_iscsitask(iscsilun, &iTask);
+
+retry:
+    if (iscsi_get_lba_status_task(iscsilun->iscsi, iscsilun->lun,
+                                  sector_qemu2lun(sector_num, iscsilun),
+                                  8 + 16, iscsi_co_generic_cb,
+                                  &iTask) == NULL) {
+        goto fail;
+    }
+
+    while (!iTask.complete) {
+        iscsi_set_events(iscsilun);
+        qemu_coroutine_yield();
+    }
+
+    if (iTask.do_retry) {
+        goto retry;
+    }
+
+    if (iTask.status != SCSI_STATUS_GOOD) {
+        /* in case the get_lba_status_callout fails (i.e.
+         * because the device is busy or the cmd is not
+         * supported) we pretend all blocks are allocated
+         * for backwards compatiblity */
+        *pnum = nb_sectors;
+        return 1;
+    }
+
+    lbas = scsi_datain_unmarshall(iTask.task);
+    if (lbas == NULL) {
+        goto fail;
+    }
+
+    lbasd = &lbas->descriptors[0];
+
+    if (sector_qemu2lun(sector_num, iscsilun) != lbasd->lba) {
+        goto fail;
+    }
+
+    *pnum = lbasd->num_blocks * (iscsilun->block_size / BDRV_SECTOR_SIZE);
+    if (*pnum > nb_sectors) {
+        *pnum = nb_sectors;
+    }
+
+    ret = lbasd->provisioning == SCSI_PROVISIONING_TYPE_MAPPED ? 1 : 0;
+
+    scsi_free_scsi_task(iTask.task);
+    return ret;
+
+fail:
+    if (iTask.task != NULL) {
+        scsi_free_scsi_task(iTask.task);
+    }
+    *pnum = 0;
+    return 0;
+}
+
 static int parse_chap(struct iscsi_context *iscsi, const char *target)
 {
     QemuOptsList *list;
@@ -1347,6 +1468,8 @@ static BlockDriver bdrv_iscsi = {
     .bdrv_getlength  = iscsi_getlength,
     .bdrv_truncate   = iscsi_truncate,
 
+    .bdrv_co_is_allocated = iscsi_co_is_allocated,
+
     .bdrv_aio_readv  = iscsi_aio_readv,
     .bdrv_aio_writev = iscsi_aio_writev,
     .bdrv_aio_flush  = iscsi_aio_flush,
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCHv2 04/11] iscsi: add bdrv_co_write_zeroes
  2013-06-27 13:11 [Qemu-devel] [PATCHv2 00/11] iscsi/qemu-img/block-migration enhancements Peter Lieven
                   ` (2 preceding siblings ...)
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 03/11] iscsi: add bdrv_co_is_allocated Peter Lieven
@ 2013-06-27 13:11 ` Peter Lieven
  2013-07-10  9:54   ` Kevin Wolf
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 05/11] block: add bdrv_write_zeroes() Peter Lieven
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 69+ messages in thread
From: Peter Lieven @ 2013-06-27 13:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Peter Lieven, ronniesahlberg, stefanha

write zeroes is emulated by unmap if the target supports unmapping
an unmapped blocks read as zero. this emulation is only done if the
device was opened with BDRV_O_UNMAP and the request can be handled
within a single request. a failback to writev is performed otherwise.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 block/iscsi.c |   44 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/block/iscsi.c b/block/iscsi.c
index d31ee95..92e66a6 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -928,6 +928,49 @@ fail:
     return 0;
 }
 
+static int
+coroutine_fn iscsi_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
+                                   int nb_sectors)
+{
+    IscsiLun *iscsilun = bs->opaque;
+    struct IscsiTask iTask;
+    struct unmap_list list[1];
+
+    if (!iscsilun->lbprz || !iscsilun->lbpu || !(bs->open_flags & BDRV_O_UNMAP) ||
+        nb_sectors * BDRV_SECTOR_SIZE / iscsilun->block_size > iscsilun->max_unmap) {
+        /* fall back to writev */
+        return -ENOTSUP;
+    }
+
+    iscsi_co_init_iscsitask(iscsilun, &iTask);
+
+    list[0].lba = sector_qemu2lun(sector_num, iscsilun);
+    list[0].num = nb_sectors * BDRV_SECTOR_SIZE / iscsilun->block_size;
+
+retry:
+    if (iscsi_unmap_task(iscsilun->iscsi, iscsilun->lun, 0, 0, &list[0], 1,
+                         iscsi_co_generic_cb, &iTask) == NULL) {
+        return -EIO;
+    }
+
+    while (!iTask.complete) {
+        iscsi_set_events(iscsilun);
+        qemu_coroutine_yield();
+    }
+
+    if (iTask.do_retry) {
+        goto retry;
+    }
+
+    if (iTask.status != SCSI_STATUS_GOOD) {
+        return -EIO;
+    }
+
+    scsi_free_scsi_task(iTask.task);
+
+    return 0;
+}
+
 static int parse_chap(struct iscsi_context *iscsi, const char *target)
 {
     QemuOptsList *list;
@@ -1469,6 +1512,7 @@ static BlockDriver bdrv_iscsi = {
     .bdrv_truncate   = iscsi_truncate,
 
     .bdrv_co_is_allocated = iscsi_co_is_allocated,
+    .bdrv_co_write_zeroes = iscsi_co_write_zeroes,
 
     .bdrv_aio_readv  = iscsi_aio_readv,
     .bdrv_aio_writev = iscsi_aio_writev,
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCHv2 05/11] block: add bdrv_write_zeroes()
  2013-06-27 13:11 [Qemu-devel] [PATCHv2 00/11] iscsi/qemu-img/block-migration enhancements Peter Lieven
                   ` (3 preceding siblings ...)
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 04/11] iscsi: add bdrv_co_write_zeroes Peter Lieven
@ 2013-06-27 13:11 ` Peter Lieven
  2013-07-10  9:56   ` Kevin Wolf
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 06/11] block/raw: add bdrv_co_write_zeroes Peter Lieven
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 69+ messages in thread
From: Peter Lieven @ 2013-06-27 13:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Peter Lieven, ronniesahlberg, stefanha

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 block.c               |   27 +++++++++++++++++++--------
 include/block/block.h |    2 ++
 2 files changed, 21 insertions(+), 8 deletions(-)

diff --git a/block.c b/block.c
index 8e77d46..08b7325 100644
--- a/block.c
+++ b/block.c
@@ -2163,6 +2163,7 @@ typedef struct RwCo {
     QEMUIOVector *qiov;
     bool is_write;
     int ret;
+    BdrvRequestFlags flags;
 } RwCo;
 
 static void coroutine_fn bdrv_rw_co_entry(void *opaque)
@@ -2171,10 +2172,12 @@ static void coroutine_fn bdrv_rw_co_entry(void *opaque)
 
     if (!rwco->is_write) {
         rwco->ret = bdrv_co_do_readv(rwco->bs, rwco->sector_num,
-                                     rwco->nb_sectors, rwco->qiov, 0);
+                                     rwco->nb_sectors, rwco->qiov,
+                                     rwco->flags);
     } else {
         rwco->ret = bdrv_co_do_writev(rwco->bs, rwco->sector_num,
-                                      rwco->nb_sectors, rwco->qiov, 0);
+                                      rwco->nb_sectors, rwco->qiov,
+                                      rwco->flags);
     }
 }
 
@@ -2182,7 +2185,8 @@ static void coroutine_fn bdrv_rw_co_entry(void *opaque)
  * Process a vectored synchronous request using coroutines
  */
 static int bdrv_rwv_co(BlockDriverState *bs, int64_t sector_num,
-                       QEMUIOVector *qiov, bool is_write)
+                       QEMUIOVector *qiov, bool is_write,
+                       BdrvRequestFlags flags)
 {
     Coroutine *co;
     RwCo rwco = {
@@ -2192,6 +2196,7 @@ static int bdrv_rwv_co(BlockDriverState *bs, int64_t sector_num,
         .qiov = qiov,
         .is_write = is_write,
         .ret = NOT_DONE,
+        .flags = flags,
     };
     assert((qiov->size & (BDRV_SECTOR_SIZE - 1)) == 0);
 
@@ -2223,7 +2228,7 @@ static int bdrv_rwv_co(BlockDriverState *bs, int64_t sector_num,
  * Process a synchronous request using coroutines
  */
 static int bdrv_rw_co(BlockDriverState *bs, int64_t sector_num, uint8_t *buf,
-                      int nb_sectors, bool is_write)
+                      int nb_sectors, bool is_write, BdrvRequestFlags flags)
 {
     QEMUIOVector qiov;
     struct iovec iov = {
@@ -2232,14 +2237,14 @@ static int bdrv_rw_co(BlockDriverState *bs, int64_t sector_num, uint8_t *buf,
     };
 
     qemu_iovec_init_external(&qiov, &iov, 1);
-    return bdrv_rwv_co(bs, sector_num, &qiov, is_write);
+    return bdrv_rwv_co(bs, sector_num, &qiov, is_write, flags);
 }
 
 /* return < 0 if error. See bdrv_write() for the return codes */
 int bdrv_read(BlockDriverState *bs, int64_t sector_num,
               uint8_t *buf, int nb_sectors)
 {
-    return bdrv_rw_co(bs, sector_num, buf, nb_sectors, false);
+    return bdrv_rw_co(bs, sector_num, buf, nb_sectors, false, 0);
 }
 
 /* Just like bdrv_read(), but with I/O throttling temporarily disabled */
@@ -2265,12 +2270,18 @@ int bdrv_read_unthrottled(BlockDriverState *bs, int64_t sector_num,
 int bdrv_write(BlockDriverState *bs, int64_t sector_num,
                const uint8_t *buf, int nb_sectors)
 {
-    return bdrv_rw_co(bs, sector_num, (uint8_t *)buf, nb_sectors, true);
+    return bdrv_rw_co(bs, sector_num, (uint8_t *)buf, nb_sectors, true, 0);
 }
 
 int bdrv_writev(BlockDriverState *bs, int64_t sector_num, QEMUIOVector *qiov)
 {
-    return bdrv_rwv_co(bs, sector_num, qiov, true);
+    return bdrv_rwv_co(bs, sector_num, qiov, true, 0);
+}
+
+int bdrv_write_zeroes(BlockDriverState *bs, int64_t sector_num, int nb_sectors)
+{
+    return bdrv_rw_co(bs, sector_num, NULL, nb_sectors, true,
+                      BDRV_REQ_ZERO_WRITE);
 }
 
 int bdrv_pread(BlockDriverState *bs, int64_t offset,
diff --git a/include/block/block.h b/include/block/block.h
index 2307f67..5a062fa 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -156,6 +156,8 @@ int bdrv_read_unthrottled(BlockDriverState *bs, int64_t sector_num,
                           uint8_t *buf, int nb_sectors);
 int bdrv_write(BlockDriverState *bs, int64_t sector_num,
                const uint8_t *buf, int nb_sectors);
+int bdrv_write_zeroes(BlockDriverState *bs, int64_t sector_num,
+               int nb_sectors);
 int bdrv_writev(BlockDriverState *bs, int64_t sector_num, QEMUIOVector *qiov);
 int bdrv_pread(BlockDriverState *bs, int64_t offset,
                void *buf, int count);
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCHv2 06/11] block/raw: add bdrv_co_write_zeroes
  2013-06-27 13:11 [Qemu-devel] [PATCHv2 00/11] iscsi/qemu-img/block-migration enhancements Peter Lieven
                   ` (4 preceding siblings ...)
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 05/11] block: add bdrv_write_zeroes() Peter Lieven
@ 2013-06-27 13:11 ` Peter Lieven
  2013-07-10  9:57   ` Kevin Wolf
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 07/11] iscsi: let bdrv_create conditionally zero out the device Peter Lieven
                   ` (4 subsequent siblings)
  10 siblings, 1 reply; 69+ messages in thread
From: Peter Lieven @ 2013-06-27 13:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Peter Lieven, ronniesahlberg, stefanha

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 block/raw.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/block/raw.c b/block/raw.c
index ce10422..8c81de9 100644
--- a/block/raw.c
+++ b/block/raw.c
@@ -42,6 +42,13 @@ static int coroutine_fn raw_co_is_allocated(BlockDriverState *bs,
     return bdrv_co_is_allocated(bs->file, sector_num, nb_sectors, pnum);
 }
 
+static int coroutine_fn raw_co_write_zeroes(BlockDriverState *bs,
+                                            int64_t sector_num,
+                                            int nb_sectors)
+{
+    return bdrv_co_write_zeroes(bs->file, sector_num, nb_sectors);
+}
+
 static int64_t raw_getlength(BlockDriverState *bs)
 {
     return bdrv_getlength(bs->file);
@@ -128,6 +135,7 @@ static BlockDriver bdrv_raw = {
     .bdrv_co_readv          = raw_co_readv,
     .bdrv_co_writev         = raw_co_writev,
     .bdrv_co_is_allocated   = raw_co_is_allocated,
+    .bdrv_co_write_zeroes   = raw_co_write_zeroes,
     .bdrv_co_discard        = raw_co_discard,
 
     .bdrv_probe         = raw_probe,
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCHv2 07/11] iscsi: let bdrv_create conditionally zero out the device
  2013-06-27 13:11 [Qemu-devel] [PATCHv2 00/11] iscsi/qemu-img/block-migration enhancements Peter Lieven
                   ` (5 preceding siblings ...)
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 06/11] block/raw: add bdrv_co_write_zeroes Peter Lieven
@ 2013-06-27 13:11 ` Peter Lieven
  2013-07-01 13:58   ` Stefan Hajnoczi
                     ` (2 more replies)
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 08/11] block-migration: efficiently encode zero blocks Peter Lieven
                   ` (3 subsequent siblings)
  10 siblings, 3 replies; 69+ messages in thread
From: Peter Lieven @ 2013-06-27 13:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Peter Lieven, ronniesahlberg, stefanha

if the device supports unmapping and unmapped blocks read as
zero ensure that the whole device is unmapped and report
.has_zero_init = 1 in this case to speed up qemu-img convert.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 block/iscsi.c |   72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 71 insertions(+), 1 deletion(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 92e66a6..621ca40 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1436,7 +1436,8 @@ static int iscsi_truncate(BlockDriverState *bs, int64_t offset)
 
 static int iscsi_has_zero_init(BlockDriverState *bs)
 {
-    return 0;
+    IscsiLun *iscsilun = bs->opaque;
+    return (iscsilun->lbpu && iscsilun->lbprz) ? 1 : 0;
 }
 
 static int iscsi_create(const char *filename, QEMUOptionParameter *options)
@@ -1446,6 +1447,7 @@ static int iscsi_create(const char *filename, QEMUOptionParameter *options)
     BlockDriverState bs;
     IscsiLun *iscsilun = NULL;
     QDict *bs_options;
+    struct scsi_task *task = NULL;
 
     memset(&bs, 0, sizeof(BlockDriverState));
 
@@ -1481,7 +1483,75 @@ static int iscsi_create(const char *filename, QEMUOptionParameter *options)
     }
 
     ret = 0;
+
+    if (iscsilun->lbpu && iscsilun->lbprz) {
+        uint64_t lba = 0;
+        while (lba < iscsilun->num_blocks) {
+            struct scsi_get_lba_status *lbas = NULL;
+            struct scsi_lba_status_descriptor *lbasd = NULL;
+            enum scsi_provisioning_type provisioning;
+            uint32_t nb_sectors;
+
+            task = iscsi_get_lba_status_sync(iscsilun->iscsi, iscsilun->lun,
+                                             lba, 8 + 16);
+            if (task == NULL || task->status != SCSI_STATUS_GOOD) {
+                error_report("iSCSI: Failed to get_lba_status on iSCSI lun. %s",
+                             iscsi_get_error(iscsilun->iscsi));
+                ret = -EINVAL;
+                goto out;
+            }
+
+            lbas = scsi_datain_unmarshall(task);
+            if (lbas == NULL) {
+                error_report("iSCSI: failed to unmarshall inquiry datain blob");
+                ret = -EINVAL;
+                goto out;
+            }
+            lbasd = &lbas->descriptors[0];
+            if (lbasd->lba != lba) {
+                ret = -EINVAL;
+                goto out;
+            }
+            nb_sectors = lbasd->num_blocks;
+            provisioning = lbasd->provisioning;
+            scsi_free_scsi_task(task);
+            task = NULL;
+
+            /* blocks from lba to lba + nb_sectors - 1 are not mapped
+             * and read as zero (lbprz==1) so we can skip them */
+            if (provisioning != SCSI_PROVISIONING_TYPE_MAPPED) {
+                lba += nb_sectors;
+                continue;
+            }
+
+            uint64_t lba2 = lba + nb_sectors;
+            while (lba < lba2) {
+                struct unmap_list list[1];
+                list[0].lba = lba;
+                list[0].num = iscsilun->max_unmap;
+                if (lba + list[0].num > iscsilun->num_blocks) {
+                    list[0].num = iscsilun->num_blocks - lba;
+                }
+                task = iscsi_unmap_sync(iscsilun->iscsi,
+                                        iscsilun->lun,
+                                        0, 0, &list[0], 1);
+                if (task == NULL || task->status != SCSI_STATUS_GOOD) {
+                    error_report("iSCSI: Failed to unmap data on iSCSI lun. %s",
+                                 iscsi_get_error(iscsilun->iscsi));
+                    ret = -EINVAL;
+                    goto out;
+                }
+                scsi_free_scsi_task(task);
+                task = NULL;
+                lba += list[0].num;
+            }
+        }
+    }
+
 out:
+    if (task) {
+        scsi_free_scsi_task(task);
+    }
     if (iscsilun->iscsi != NULL) {
         iscsi_destroy_context(iscsilun->iscsi);
     }
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCHv2 08/11] block-migration: efficiently encode zero blocks
  2013-06-27 13:11 [Qemu-devel] [PATCHv2 00/11] iscsi/qemu-img/block-migration enhancements Peter Lieven
                   ` (6 preceding siblings ...)
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 07/11] iscsi: let bdrv_create conditionally zero out the device Peter Lieven
@ 2013-06-27 13:11 ` Peter Lieven
  2013-07-01 14:13   ` Stefan Hajnoczi
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 09/11] iscsi: factor out sector conversions Peter Lieven
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 69+ messages in thread
From: Peter Lieven @ 2013-06-27 13:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Peter Lieven, ronniesahlberg, stefanha

this patch adds a efficient encoding for zero blocks by
adding a new flag indiciating a block is completly zero.

additionally bdrv_write_zeros() is used at the destination
to efficiently write these zeroes. if the driver supports
it this avoids blindly allocating all sectors consumed by
zero blocks effectively re-thinning the device.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 block-migration.c             |   29 +++++++++++++++++++++++------
 include/migration/qemu-file.h |    1 +
 savevm.c                      |    2 +-
 3 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/block-migration.c b/block-migration.c
index 2fd7699..99b3757 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -29,6 +29,7 @@
 #define BLK_MIG_FLAG_DEVICE_BLOCK       0x01
 #define BLK_MIG_FLAG_EOS                0x02
 #define BLK_MIG_FLAG_PROGRESS           0x04
+#define BLK_MIG_FLAG_ZERO_BLOCK         0x08
 
 #define MAX_IS_ALLOCATED_SEARCH 65536
 
@@ -114,16 +115,29 @@ static void blk_mig_unlock(void)
 static void blk_send(QEMUFile *f, BlkMigBlock * blk)
 {
     int len;
+    int flags = BLK_MIG_FLAG_DEVICE_BLOCK;
+    
+    if (buffer_is_zero(blk->buf, BLOCK_SIZE)) {
+        flags |= BLK_MIG_FLAG_ZERO_BLOCK;
+    }
 
     /* sector number and flags */
     qemu_put_be64(f, (blk->sector << BDRV_SECTOR_BITS)
-                     | BLK_MIG_FLAG_DEVICE_BLOCK);
+                     | flags);
 
     /* device name */
     len = strlen(blk->bmds->bs->device_name);
     qemu_put_byte(f, len);
     qemu_put_buffer(f, (uint8_t *)blk->bmds->bs->device_name, len);
 
+    /* if a block is zero we need to flush here since the network
+     * bandwidth is now a lot higher than the storage device bandwidth.
+     * thus if we queue zero blocks we slow down the migration */
+    if (flags & BLK_MIG_FLAG_ZERO_BLOCK) {
+        qemu_fflush(f);
+        return;
+    }
+
     qemu_put_buffer(f, blk->buf, BLOCK_SIZE);
 }
 
@@ -762,12 +776,15 @@ static int block_load(QEMUFile *f, void *opaque, int version_id)
                 nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
             }
 
-            buf = g_malloc(BLOCK_SIZE);
-
-            qemu_get_buffer(f, buf, BLOCK_SIZE);
-            ret = bdrv_write(bs, addr, buf, nr_sectors);
+            if (flags & BLK_MIG_FLAG_ZERO_BLOCK) {
+                ret = bdrv_write_zeroes(bs, addr, nr_sectors);
+            } else {
+                buf = g_malloc(BLOCK_SIZE);
+                qemu_get_buffer(f, buf, BLOCK_SIZE);
+                ret = bdrv_write(bs, addr, buf, nr_sectors);
+                g_free(buf);
+            }
 
-            g_free(buf);
             if (ret < 0) {
                 return ret;
             }
diff --git a/include/migration/qemu-file.h b/include/migration/qemu-file.h
index 7519464..b73298d 100644
--- a/include/migration/qemu-file.h
+++ b/include/migration/qemu-file.h
@@ -71,6 +71,7 @@ QEMUFile *qemu_fdopen(int fd, const char *mode);
 QEMUFile *qemu_fopen_socket(int fd, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
 int qemu_get_fd(QEMUFile *f);
+void qemu_fflush(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
 int64_t qemu_ftell(QEMUFile *f);
 void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size);
diff --git a/savevm.c b/savevm.c
index 48cc2a9..4d898af 100644
--- a/savevm.c
+++ b/savevm.c
@@ -611,7 +611,7 @@ static inline bool qemu_file_is_writable(QEMUFile *f)
  * If there is writev_buffer QEMUFileOps it uses it otherwise uses
  * put_buffer ops.
  */
-static void qemu_fflush(QEMUFile *f)
+void qemu_fflush(QEMUFile *f)
 {
     ssize_t ret = 0;
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCHv2 09/11] iscsi: factor out sector conversions
  2013-06-27 13:11 [Qemu-devel] [PATCHv2 00/11] iscsi/qemu-img/block-migration enhancements Peter Lieven
                   ` (7 preceding siblings ...)
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 08/11] block-migration: efficiently encode zero blocks Peter Lieven
@ 2013-06-27 13:11 ` Peter Lieven
  2013-07-10 11:29   ` Kevin Wolf
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 10/11] iscsi: ignore aio_discard if unsupported Peter Lieven
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 11/11] iscsi: assert that sectors are aligned to LUN blocksize Peter Lieven
  10 siblings, 1 reply; 69+ messages in thread
From: Peter Lieven @ 2013-06-27 13:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Peter Lieven, ronniesahlberg, stefanha

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 block/iscsi.c |   18 +++++++++++-------
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 621ca40..e9ecfce 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -293,6 +293,11 @@ static int64_t sector_qemu2lun(int64_t sector, IscsiLun *iscsilun)
     return sector * BDRV_SECTOR_SIZE / iscsilun->block_size;
 }
 
+static int64_t sector_lun2qemu(int64_t sector, IscsiLun *iscsilun)
+{
+    return sector * iscsilun->block_size / BDRV_SECTOR_SIZE;
+}
+
 static int
 iscsi_aio_writev_acb(IscsiAIOCB *acb)
 {
@@ -340,7 +345,7 @@ iscsi_aio_writev_acb(IscsiAIOCB *acb)
     lba = sector_qemu2lun(acb->sector_num, acb->iscsilun);
     *(uint32_t *)&acb->task->cdb[2]  = htonl(lba >> 32);
     *(uint32_t *)&acb->task->cdb[6]  = htonl(lba & 0xffffffff);
-    num_sectors = size / acb->iscsilun->block_size;
+    num_sectors = sector_qemu2lun(acb->nb_sectors, acb->iscsilun);
     *(uint32_t *)&acb->task->cdb[10] = htonl(num_sectors);
     acb->task->expxferlen = size;
 
@@ -659,7 +664,7 @@ static int iscsi_aio_discard_acb(IscsiAIOCB *acb) {
     acb->buf        = NULL;
 
     list[0].lba = sector_qemu2lun(acb->sector_num, acb->iscsilun);
-    list[0].num = acb->nb_sectors * BDRV_SECTOR_SIZE / acb->iscsilun->block_size;
+    list[0].num = sector_qemu2lun(acb->nb_sectors, acb->iscsilun);
 
     acb->task = iscsi_unmap_task(iscsi, acb->iscsilun->lun,
                                  0, 0, &list[0], 1,
@@ -910,7 +915,7 @@ retry:
         goto fail;
     }
 
-    *pnum = lbasd->num_blocks * (iscsilun->block_size / BDRV_SECTOR_SIZE);
+    *pnum = sector_lun2qemu(lbasd->num_blocks, iscsilun);
     if (*pnum > nb_sectors) {
         *pnum = nb_sectors;
     }
@@ -937,7 +942,7 @@ coroutine_fn iscsi_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
     struct unmap_list list[1];
 
     if (!iscsilun->lbprz || !iscsilun->lbpu || !(bs->open_flags & BDRV_O_UNMAP) ||
-        nb_sectors * BDRV_SECTOR_SIZE / iscsilun->block_size > iscsilun->max_unmap) {
+        sector_qemu2lun(nb_sectors, iscsilun) > iscsilun->max_unmap) {
         /* fall back to writev */
         return -ENOTSUP;
     }
@@ -945,7 +950,7 @@ coroutine_fn iscsi_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
     iscsi_co_init_iscsitask(iscsilun, &iTask);
 
     list[0].lba = sector_qemu2lun(sector_num, iscsilun);
-    list[0].num = nb_sectors * BDRV_SECTOR_SIZE / iscsilun->block_size;
+    list[0].num = sector_qemu2lun(nb_sectors, iscsilun);
 
 retry:
     if (iscsi_unmap_task(iscsilun->iscsi, iscsilun->lun, 0, 0, &list[0], 1,
@@ -1322,8 +1327,7 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags)
     if ((ret = iscsi_readcapacity_sync(iscsilun)) != 0) {
         goto out;
     }
-    bs->total_sectors    = iscsilun->num_blocks *
-                           iscsilun->block_size / BDRV_SECTOR_SIZE ;
+    bs->total_sectors    = sector_lun2qemu(iscsilun->num_blocks, iscsilun);
 
     /* Medium changer or tape. We dont have any emulation for this so this must
      * be sg ioctl compatible. We force it to be sg, otherwise qemu will try
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCHv2 10/11] iscsi: ignore aio_discard if unsupported
  2013-06-27 13:11 [Qemu-devel] [PATCHv2 00/11] iscsi/qemu-img/block-migration enhancements Peter Lieven
                   ` (8 preceding siblings ...)
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 09/11] iscsi: factor out sector conversions Peter Lieven
@ 2013-06-27 13:11 ` Peter Lieven
  2013-07-10 11:33   ` Kevin Wolf
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 11/11] iscsi: assert that sectors are aligned to LUN blocksize Peter Lieven
  10 siblings, 1 reply; 69+ messages in thread
From: Peter Lieven @ 2013-06-27 13:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Peter Lieven, ronniesahlberg, stefanha

if the target does not support UNMAP or the request
is too big silently ignore the discard request.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 block/iscsi.c |    8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/block/iscsi.c b/block/iscsi.c
index e9ecfce..0567b46 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -694,6 +694,14 @@ iscsi_aio_discard(BlockDriverState *bs,
     acb->sector_num  = sector_num;
     acb->retries     = ISCSI_CMD_RETRIES;
 
+    if (!iscsilun->lbpu ||
+        sector_qemu2lun(nb_sectors, iscsilun) > iscsilun->max_unmap) {
+        /* silently ignore discard request */
+        acb->status = 0;
+        iscsi_schedule_bh(acb);
+        return &acb->common;
+    }
+
     if (iscsi_aio_discard_acb(acb) != 0) {
         qemu_aio_release(acb);
         return NULL;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* [Qemu-devel] [PATCHv2 11/11] iscsi: assert that sectors are aligned to LUN blocksize
  2013-06-27 13:11 [Qemu-devel] [PATCHv2 00/11] iscsi/qemu-img/block-migration enhancements Peter Lieven
                   ` (9 preceding siblings ...)
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 10/11] iscsi: ignore aio_discard if unsupported Peter Lieven
@ 2013-06-27 13:11 ` Peter Lieven
  2013-07-01 14:35   ` Stefan Hajnoczi
  2013-07-10 11:38   ` Kevin Wolf
  10 siblings, 2 replies; 69+ messages in thread
From: Peter Lieven @ 2013-06-27 13:11 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, Peter Lieven, ronniesahlberg, stefanha

if the blocksize of an iSCSI LUN is bigger than the BDRV_SECTOR_SIZE
it is possible that sector_num or nb_sectors are not correctly
alligned.

to avoid corruption we fail requests which are misaligned.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 block/iscsi.c |   27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/block/iscsi.c b/block/iscsi.c
index 0567b46..bff2e1f 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -298,6 +298,13 @@ static int64_t sector_lun2qemu(int64_t sector, IscsiLun *iscsilun)
     return sector * iscsilun->block_size / BDRV_SECTOR_SIZE;
 }
 
+static int64_t is_request_lun_aligned(int64_t sector_num, int nb_sectors,
+                                      IscsiLun *iscsilun)
+{
+    return ((sector_num * BDRV_SECTOR_SIZE) % iscsilun->block_size ||
+            (nb_sectors * BDRV_SECTOR_SIZE) % iscsilun->block_size) ? 0 : 1;
+}
+
 static int
 iscsi_aio_writev_acb(IscsiAIOCB *acb)
 {
@@ -382,6 +389,10 @@ iscsi_aio_writev(BlockDriverState *bs, int64_t sector_num,
     IscsiLun *iscsilun = bs->opaque;
     IscsiAIOCB *acb;
 
+    if (!is_request_lun_aligned(sector_num, nb_sectors, iscsilun)) {
+        return NULL;
+    }
+
     acb = qemu_aio_get(&iscsi_aiocb_info, bs, cb, opaque);
     trace_iscsi_aio_writev(iscsilun->iscsi, sector_num, nb_sectors, opaque, acb);
 
@@ -524,6 +535,10 @@ iscsi_aio_readv(BlockDriverState *bs, int64_t sector_num,
     IscsiLun *iscsilun = bs->opaque;
     IscsiAIOCB *acb;
 
+    if (!is_request_lun_aligned(sector_num, 0, iscsilun)) {
+        return NULL;
+    }
+
     acb = qemu_aio_get(&iscsi_aiocb_info, bs, cb, opaque);
     trace_iscsi_aio_readv(iscsilun->iscsi, sector_num, nb_sectors, opaque, acb);
 
@@ -687,6 +702,10 @@ iscsi_aio_discard(BlockDriverState *bs,
     IscsiLun *iscsilun = bs->opaque;
     IscsiAIOCB *acb;
 
+    if (!is_request_lun_aligned(sector_num, nb_sectors, iscsilun)) {
+        return NULL;
+    }
+
     acb = qemu_aio_get(&iscsi_aiocb_info, bs, cb, opaque);
 
     acb->iscsilun    = iscsilun;
@@ -878,6 +897,10 @@ static int coroutine_fn iscsi_co_is_allocated(BlockDriverState *bs,
     struct IscsiTask iTask;
     int ret;
 
+    if (!is_request_lun_aligned(sector_num, nb_sectors, iscsilun)) {
+        goto fail;
+    }
+
     /* LUN does not support logical block provisioning */
     if (iscsilun->lbpme == 0) {
         *pnum = nb_sectors;
@@ -949,6 +972,10 @@ coroutine_fn iscsi_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
     struct IscsiTask iTask;
     struct unmap_list list[1];
 
+    if (!is_request_lun_aligned(sector_num, nb_sectors, iscsilun)) {
+        return -EIO;
+    }
+
     if (!iscsilun->lbprz || !iscsilun->lbpu || !(bs->open_flags & BDRV_O_UNMAP) ||
         sector_qemu2lun(nb_sectors, iscsilun) > iscsilun->max_unmap) {
         /* fall back to writev */
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 01/11] iscsi: add logical block provisioning information to iscsilun
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 01/11] iscsi: add logical block provisioning information to iscsilun Peter Lieven
@ 2013-07-01 13:35   ` Stefan Hajnoczi
  2013-07-01 16:08     ` Peter Lieven
  2013-07-10  9:19   ` Kevin Wolf
  1 sibling, 1 reply; 69+ messages in thread
From: Stefan Hajnoczi @ 2013-07-01 13:35 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kwolf, pbonzini, qemu-devel, ronniesahlberg

On Thu, Jun 27, 2013 at 03:11:25PM +0200, Peter Lieven wrote:
> @@ -1130,6 +1137,46 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags)
>          bs->sg = 1;
>      }
>  
> +    if (iscsilun->lbpme) {
> +        struct scsi_inquiry_logical_block_provisioning *inq_lbp;
> +        int full_size;
> +
> +        task = iscsi_inquiry_sync(iscsi, iscsilun->lun, 1,
> +                                  SCSI_INQUIRY_PAGECODE_LOGICAL_BLOCK_PROVISIONING,
> +                                  64);

Does this leak the previous task from line 1101?

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 03/11] iscsi: add bdrv_co_is_allocated
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 03/11] iscsi: add bdrv_co_is_allocated Peter Lieven
@ 2013-07-01 13:46   ` Stefan Hajnoczi
  2013-07-01 16:00     ` Peter Lieven
  2013-07-10  9:41   ` Kevin Wolf
  1 sibling, 1 reply; 69+ messages in thread
From: Stefan Hajnoczi @ 2013-07-01 13:46 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kwolf, pbonzini, qemu-devel, ronniesahlberg

On Thu, Jun 27, 2013 at 03:11:27PM +0200, Peter Lieven wrote:
> @@ -113,6 +122,45 @@ iscsi_schedule_bh(IscsiAIOCB *acb)
>      qemu_bh_schedule(acb->bh);
>  }
>  
> +static void
> +iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
> +                        void *command_data, void *opaque)
> +{
> +    struct IscsiTask *iTask = opaque;
> +    struct scsi_task *task = command_data;
> +
> +    iTask->complete = 1;
> +    iTask->status = status;
> +    iTask->do_retry = 0;
> +    iTask->task = task;
> +
> +    if (iTask->retries-- > 0 && status == SCSI_STATUS_CHECK_CONDITION
> +        && task->sense.key == SCSI_SENSE_UNIT_ATTENTION) {
> +        iTask->do_retry = 1;
> +        goto out;
> +    }
> +
> +    if (status != SCSI_STATUS_GOOD) {
> +        error_report("iSCSI: Failure. %s", iscsi_get_error(iscsi));
> +        goto out;
> +    }
> +
> +out:
> +    if (iTask->status != SCSI_STATUS_GOOD) {
> +        scsi_free_scsi_task(iTask->task);
> +        iTask->task = NULL;
> +    }

Not sure about freeing the task here since the caller is still
responsible for freeing the task in the success case and higher-level
error cases.

I suggest keeping iTask->task alive here so the caller is easier to
maintain (it can use a single out/error path).

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 07/11] iscsi: let bdrv_create conditionally zero out the device
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 07/11] iscsi: let bdrv_create conditionally zero out the device Peter Lieven
@ 2013-07-01 13:58   ` Stefan Hajnoczi
  2013-07-01 20:20   ` Paolo Bonzini
  2013-07-10 10:14   ` Kevin Wolf
  2 siblings, 0 replies; 69+ messages in thread
From: Stefan Hajnoczi @ 2013-07-01 13:58 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kwolf, pbonzini, qemu-devel, ronniesahlberg

On Thu, Jun 27, 2013 at 03:11:31PM +0200, Peter Lieven wrote:
> @@ -1481,7 +1483,75 @@ static int iscsi_create(const char *filename, QEMUOptionParameter *options)
>      }
>  
>      ret = 0;
> +
> +    if (iscsilun->lbpu && iscsilun->lbprz) {

Moving this block of code into a function would be nice.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 08/11] block-migration: efficiently encode zero blocks
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 08/11] block-migration: efficiently encode zero blocks Peter Lieven
@ 2013-07-01 14:13   ` Stefan Hajnoczi
  2013-07-01 15:55     ` Peter Lieven
  2013-07-01 16:09     ` Peter Lieven
  0 siblings, 2 replies; 69+ messages in thread
From: Stefan Hajnoczi @ 2013-07-01 14:13 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kwolf, pbonzini, qemu-devel, ronniesahlberg

On Thu, Jun 27, 2013 at 03:11:32PM +0200, Peter Lieven wrote:

This patch breaks cross-version blog migration.  We need to control
whether or not to use the new BLK_MIG_FLAG_ZERO_BLOCK flag.

> diff --git a/block-migration.c b/block-migration.c
> index 2fd7699..99b3757 100644
> --- a/block-migration.c
> +++ b/block-migration.c
> @@ -29,6 +29,7 @@
>  #define BLK_MIG_FLAG_DEVICE_BLOCK       0x01
>  #define BLK_MIG_FLAG_EOS                0x02
>  #define BLK_MIG_FLAG_PROGRESS           0x04
> +#define BLK_MIG_FLAG_ZERO_BLOCK         0x08
>  
>  #define MAX_IS_ALLOCATED_SEARCH 65536
>  
> @@ -114,16 +115,29 @@ static void blk_mig_unlock(void)
>  static void blk_send(QEMUFile *f, BlkMigBlock * blk)
>  {
>      int len;
> +    int flags = BLK_MIG_FLAG_DEVICE_BLOCK;
> +    
> +    if (buffer_is_zero(blk->buf, BLOCK_SIZE)) {
> +        flags |= BLK_MIG_FLAG_ZERO_BLOCK;
> +    }
>  
>      /* sector number and flags */
>      qemu_put_be64(f, (blk->sector << BDRV_SECTOR_BITS)
> -                     | BLK_MIG_FLAG_DEVICE_BLOCK);
> +                     | flags);

blk->sector is int64_t and flags is signed int.  This conversion will
sign-extend from 32-bit flags up to 64-bits.

Flags should be uint64_t or at least unsigned so that we don't hit
sign-extension when BLK_MIG_FLAG_x uses the top bit.

>  
>      /* device name */
>      len = strlen(blk->bmds->bs->device_name);
>      qemu_put_byte(f, len);
>      qemu_put_buffer(f, (uint8_t *)blk->bmds->bs->device_name, len);
>  
> +    /* if a block is zero we need to flush here since the network
> +     * bandwidth is now a lot higher than the storage device bandwidth.
> +     * thus if we queue zero blocks we slow down the migration */
> +    if (flags & BLK_MIG_FLAG_ZERO_BLOCK) {
> +        qemu_fflush(f);
> +        return;
> +    }

Not sure I understand this.  Is the problem that the other side may
require an slow writev() to fill zeros?  So you want to tell the
destination about the zeroes ASAP.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 11/11] iscsi: assert that sectors are aligned to LUN blocksize
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 11/11] iscsi: assert that sectors are aligned to LUN blocksize Peter Lieven
@ 2013-07-01 14:35   ` Stefan Hajnoczi
  2013-07-01 15:59     ` Peter Lieven
  2013-07-10 11:38   ` Kevin Wolf
  1 sibling, 1 reply; 69+ messages in thread
From: Stefan Hajnoczi @ 2013-07-01 14:35 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kwolf, pbonzini, qemu-devel, ronniesahlberg

On Thu, Jun 27, 2013 at 03:11:35PM +0200, Peter Lieven wrote:
> if the blocksize of an iSCSI LUN is bigger than the BDRV_SECTOR_SIZE
> it is possible that sector_num or nb_sectors are not correctly
> alligned.
> 
> to avoid corruption we fail requests which are misaligned.
> 
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  block/iscsi.c |   27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/block/iscsi.c b/block/iscsi.c
> index 0567b46..bff2e1f 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -298,6 +298,13 @@ static int64_t sector_lun2qemu(int64_t sector, IscsiLun *iscsilun)
>      return sector * iscsilun->block_size / BDRV_SECTOR_SIZE;
>  }
>  
> +static int64_t is_request_lun_aligned(int64_t sector_num, int nb_sectors,
> +                                      IscsiLun *iscsilun)
> +{
> +    return ((sector_num * BDRV_SECTOR_SIZE) % iscsilun->block_size ||
> +            (nb_sectors * BDRV_SECTOR_SIZE) % iscsilun->block_size) ? 0 : 1;
> +}

When QEMU does geometry probing it reads 2KB.  So if the LUN has 4KB
block size then QEMU will fail to open it?  This would also affect image
formats on top of iSCSI LUNs.

AFAICT we have no way to passing I/O topology information up from the
block driver.

Stefan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 08/11] block-migration: efficiently encode zero blocks
  2013-07-01 14:13   ` Stefan Hajnoczi
@ 2013-07-01 15:55     ` Peter Lieven
  2013-07-02  7:40       ` Stefan Hajnoczi
  2013-07-01 16:09     ` Peter Lieven
  1 sibling, 1 reply; 69+ messages in thread
From: Peter Lieven @ 2013-07-01 15:55 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwolf, pbonzini, qemu-devel, ronniesahlberg


Am 01.07.2013 um 16:13 schrieb Stefan Hajnoczi <stefanha@redhat.com>:

> On Thu, Jun 27, 2013 at 03:11:32PM +0200, Peter Lieven wrote:
> 
> This patch breaks cross-version blog migration.  We need to control
> whether or not to use the new BLK_MIG_FLAG_ZERO_BLOCK flag.

you are right the upgrade way works, but downgrade not. what is the
proposed way to fix this?

> 
>> diff --git a/block-migration.c b/block-migration.c
>> index 2fd7699..99b3757 100644
>> --- a/block-migration.c
>> +++ b/block-migration.c
>> @@ -29,6 +29,7 @@
>> #define BLK_MIG_FLAG_DEVICE_BLOCK       0x01
>> #define BLK_MIG_FLAG_EOS                0x02
>> #define BLK_MIG_FLAG_PROGRESS           0x04
>> +#define BLK_MIG_FLAG_ZERO_BLOCK         0x08
>> 
>> #define MAX_IS_ALLOCATED_SEARCH 65536
>> 
>> @@ -114,16 +115,29 @@ static void blk_mig_unlock(void)
>> static void blk_send(QEMUFile *f, BlkMigBlock * blk)
>> {
>>     int len;
>> +    int flags = BLK_MIG_FLAG_DEVICE_BLOCK;
>> +    
>> +    if (buffer_is_zero(blk->buf, BLOCK_SIZE)) {
>> +        flags |= BLK_MIG_FLAG_ZERO_BLOCK;
>> +    }
>> 
>>     /* sector number and flags */
>>     qemu_put_be64(f, (blk->sector << BDRV_SECTOR_BITS)
>> -                     | BLK_MIG_FLAG_DEVICE_BLOCK);
>> +                     | flags);
> 
> blk->sector is int64_t and flags is signed int.  This conversion will
> sign-extend from 32-bit flags up to 64-bits.
> 
> Flags should be uint64_t or at least unsigned so that we don't hit
> sign-extension when BLK_MIG_FLAG_x uses the top bit.

will fix this. thanks,

Peter

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 11/11] iscsi: assert that sectors are aligned to LUN blocksize
  2013-07-01 14:35   ` Stefan Hajnoczi
@ 2013-07-01 15:59     ` Peter Lieven
  2013-07-02  7:44       ` Stefan Hajnoczi
  0 siblings, 1 reply; 69+ messages in thread
From: Peter Lieven @ 2013-07-01 15:59 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwolf, pbonzini, qemu-devel, ronniesahlberg


Am 01.07.2013 um 16:35 schrieb Stefan Hajnoczi <stefanha@redhat.com>:

> On Thu, Jun 27, 2013 at 03:11:35PM +0200, Peter Lieven wrote:
>> if the blocksize of an iSCSI LUN is bigger than the BDRV_SECTOR_SIZE
>> it is possible that sector_num or nb_sectors are not correctly
>> alligned.
>> 
>> to avoid corruption we fail requests which are misaligned.
>> 
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>> block/iscsi.c |   27 +++++++++++++++++++++++++++
>> 1 file changed, 27 insertions(+)
>> 
>> diff --git a/block/iscsi.c b/block/iscsi.c
>> index 0567b46..bff2e1f 100644
>> --- a/block/iscsi.c
>> +++ b/block/iscsi.c
>> @@ -298,6 +298,13 @@ static int64_t sector_lun2qemu(int64_t sector, IscsiLun *iscsilun)
>>     return sector * iscsilun->block_size / BDRV_SECTOR_SIZE;
>> }
>> 
>> +static int64_t is_request_lun_aligned(int64_t sector_num, int nb_sectors,
>> +                                      IscsiLun *iscsilun)
>> +{
>> +    return ((sector_num * BDRV_SECTOR_SIZE) % iscsilun->block_size ||
>> +            (nb_sectors * BDRV_SECTOR_SIZE) % iscsilun->block_size) ? 0 : 1;
>> +}
> 
> When QEMU does geometry probing it reads 2KB.  So if the LUN has 4KB
> block size then QEMU will fail to open it?  This would also affect image
> formats on top of iSCSI LUNs.

opening a 4K LUN does not fail with my target. So writing unaligned sectors will
result in corruption. we should at least fail all those unaligned operations until
we have a fix or workaround in place in place.

> 
> AFAICT we have no way to passing I/O topology information up from the
> block driver.

not at the moment. Paolo had a patch series back in Dec 2011, but it never went
upstream. I asked him off list and he told me that 4K drives where not important
enough and the Redhat bug for this was closed. Now with 4K iSCSI targets these
old work could become important again.

It would be nice to find out if there was anything wrong with them or what has to
be done to get them integrated.

Peter

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 03/11] iscsi: add bdrv_co_is_allocated
  2013-07-01 13:46   ` Stefan Hajnoczi
@ 2013-07-01 16:00     ` Peter Lieven
  0 siblings, 0 replies; 69+ messages in thread
From: Peter Lieven @ 2013-07-01 16:00 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwolf, pbonzini, qemu-devel, ronniesahlberg


Am 01.07.2013 um 15:46 schrieb Stefan Hajnoczi <stefanha@redhat.com>:

> On Thu, Jun 27, 2013 at 03:11:27PM +0200, Peter Lieven wrote:
>> @@ -113,6 +122,45 @@ iscsi_schedule_bh(IscsiAIOCB *acb)
>>     qemu_bh_schedule(acb->bh);
>> }
>> 
>> +static void
>> +iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
>> +                        void *command_data, void *opaque)
>> +{
>> +    struct IscsiTask *iTask = opaque;
>> +    struct scsi_task *task = command_data;
>> +
>> +    iTask->complete = 1;
>> +    iTask->status = status;
>> +    iTask->do_retry = 0;
>> +    iTask->task = task;
>> +
>> +    if (iTask->retries-- > 0 && status == SCSI_STATUS_CHECK_CONDITION
>> +        && task->sense.key == SCSI_SENSE_UNIT_ATTENTION) {
>> +        iTask->do_retry = 1;
>> +        goto out;
>> +    }
>> +
>> +    if (status != SCSI_STATUS_GOOD) {
>> +        error_report("iSCSI: Failure. %s", iscsi_get_error(iscsi));
>> +        goto out;
>> +    }
>> +
>> +out:
>> +    if (iTask->status != SCSI_STATUS_GOOD) {
>> +        scsi_free_scsi_task(iTask->task);
>> +        iTask->task = NULL;
>> +    }
> 
> Not sure about freeing the task here since the caller is still
> responsible for freeing the task in the success case and higher-level
> error cases.
> 
> I suggest keeping iTask->task alive here so the caller is easier to
> maintain (it can use a single out/error path).

ok, will change this in v3

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 01/11] iscsi: add logical block provisioning information to iscsilun
  2013-07-01 13:35   ` Stefan Hajnoczi
@ 2013-07-01 16:08     ` Peter Lieven
  0 siblings, 0 replies; 69+ messages in thread
From: Peter Lieven @ 2013-07-01 16:08 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwolf, pbonzini, qemu-devel, ronniesahlberg


Am 01.07.2013 um 15:35 schrieb Stefan Hajnoczi <stefanha@redhat.com>:

> On Thu, Jun 27, 2013 at 03:11:25PM +0200, Peter Lieven wrote:
>> @@ -1130,6 +1137,46 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags)
>>         bs->sg = 1;
>>     }
>> 
>> +    if (iscsilun->lbpme) {
>> +        struct scsi_inquiry_logical_block_provisioning *inq_lbp;
>> +        int full_size;
>> +
>> +        task = iscsi_inquiry_sync(iscsi, iscsilun->lun, 1,
>> +                                  SCSI_INQUIRY_PAGECODE_LOGICAL_BLOCK_PROVISIONING,
>> +                                  64);
> 
> Does this leak the previous task from line 1101?

You are right. This is silently fixed in PATCH 2/11. I will put both patches in one in v3.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 08/11] block-migration: efficiently encode zero blocks
  2013-07-01 14:13   ` Stefan Hajnoczi
  2013-07-01 15:55     ` Peter Lieven
@ 2013-07-01 16:09     ` Peter Lieven
  2013-07-02  7:36       ` Stefan Hajnoczi
  1 sibling, 1 reply; 69+ messages in thread
From: Peter Lieven @ 2013-07-01 16:09 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwolf, pbonzini, qemu-devel, ronniesahlberg


Am 01.07.2013 um 16:13 schrieb Stefan Hajnoczi <stefanha@redhat.com>:

> On Thu, Jun 27, 2013 at 03:11:32PM +0200, Peter Lieven wrote:
> 
> This patch breaks cross-version blog migration.  We need to control
> whether or not to use the new BLK_MIG_FLAG_ZERO_BLOCK flag.
> 
>> diff --git a/block-migration.c b/block-migration.c
>> index 2fd7699..99b3757 100644
>> --- a/block-migration.c
>> +++ b/block-migration.c
>> @@ -29,6 +29,7 @@
>> #define BLK_MIG_FLAG_DEVICE_BLOCK       0x01
>> #define BLK_MIG_FLAG_EOS                0x02
>> #define BLK_MIG_FLAG_PROGRESS           0x04
>> +#define BLK_MIG_FLAG_ZERO_BLOCK         0x08
>> 
>> #define MAX_IS_ALLOCATED_SEARCH 65536
>> 
>> @@ -114,16 +115,29 @@ static void blk_mig_unlock(void)
>> static void blk_send(QEMUFile *f, BlkMigBlock * blk)
>> {
>>     int len;
>> +    int flags = BLK_MIG_FLAG_DEVICE_BLOCK;
>> +    
>> +    if (buffer_is_zero(blk->buf, BLOCK_SIZE)) {
>> +        flags |= BLK_MIG_FLAG_ZERO_BLOCK;
>> +    }
>> 
>>     /* sector number and flags */
>>     qemu_put_be64(f, (blk->sector << BDRV_SECTOR_BITS)
>> -                     | BLK_MIG_FLAG_DEVICE_BLOCK);
>> +                     | flags);
> 
> blk->sector is int64_t and flags is signed int.  This conversion will
> sign-extend from 32-bit flags up to 64-bits.
> 
> Flags should be uint64_t or at least unsigned so that we don't hit
> sign-extension when BLK_MIG_FLAG_x uses the top bit.
> 
>> 
>>     /* device name */
>>     len = strlen(blk->bmds->bs->device_name);
>>     qemu_put_byte(f, len);
>>     qemu_put_buffer(f, (uint8_t *)blk->bmds->bs->device_name, len);
>> 
>> +    /* if a block is zero we need to flush here since the network
>> +     * bandwidth is now a lot higher than the storage device bandwidth.
>> +     * thus if we queue zero blocks we slow down the migration */
>> +    if (flags & BLK_MIG_FLAG_ZERO_BLOCK) {
>> +        qemu_fflush(f);
>> +        return;
>> +    }
> 
> Not sure I understand this.  Is the problem that the other side may
> require an slow writev() to fill zeros?  So you want to tell the
> destination about the zeroes ASAP.

Sorry, missed this question. Yes. If a lot of zero blocks is queued it delays
migration because the target is idle while the source is queuing zero blocks.

Peter

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 07/11] iscsi: let bdrv_create conditionally zero out the device
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 07/11] iscsi: let bdrv_create conditionally zero out the device Peter Lieven
  2013-07-01 13:58   ` Stefan Hajnoczi
@ 2013-07-01 20:20   ` Paolo Bonzini
  2013-07-01 21:36     ` Peter Lieven
  2013-07-10 10:14   ` Kevin Wolf
  2 siblings, 1 reply; 69+ messages in thread
From: Paolo Bonzini @ 2013-07-01 20:20 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kwolf, ronniesahlberg, qemu-devel, stefanha

Il 27/06/2013 15:11, Peter Lieven ha scritto:
> if the device supports unmapping and unmapped blocks read as
> zero ensure that the whole device is unmapped and report
> .has_zero_init = 1 in this case to speed up qemu-img convert.

This can still take several minutes.  Do you need any special timeout in
libiscsi?

Perhaps we can have a new "discard_zeroes" field in bdrv_get_info, and
the unmap functionality can be moved up to qemu-img convert?

Paolo

> 
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  block/iscsi.c |   72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 71 insertions(+), 1 deletion(-)
> 
> diff --git a/block/iscsi.c b/block/iscsi.c
> index 92e66a6..621ca40 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -1436,7 +1436,8 @@ static int iscsi_truncate(BlockDriverState *bs, int64_t offset)
>  
>  static int iscsi_has_zero_init(BlockDriverState *bs)
>  {
> -    return 0;
> +    IscsiLun *iscsilun = bs->opaque;
> +    return (iscsilun->lbpu && iscsilun->lbprz) ? 1 : 0;
>  }
>  
>  static int iscsi_create(const char *filename, QEMUOptionParameter *options)
> @@ -1446,6 +1447,7 @@ static int iscsi_create(const char *filename, QEMUOptionParameter *options)
>      BlockDriverState bs;
>      IscsiLun *iscsilun = NULL;
>      QDict *bs_options;
> +    struct scsi_task *task = NULL;
>  
>      memset(&bs, 0, sizeof(BlockDriverState));
>  
> @@ -1481,7 +1483,75 @@ static int iscsi_create(const char *filename, QEMUOptionParameter *options)
>      }
>  
>      ret = 0;
> +
> +    if (iscsilun->lbpu && iscsilun->lbprz) {
> +        uint64_t lba = 0;
> +        while (lba < iscsilun->num_blocks) {
> +            struct scsi_get_lba_status *lbas = NULL;
> +            struct scsi_lba_status_descriptor *lbasd = NULL;
> +            enum scsi_provisioning_type provisioning;
> +            uint32_t nb_sectors;
> +
> +            task = iscsi_get_lba_status_sync(iscsilun->iscsi, iscsilun->lun,
> +                                             lba, 8 + 16);
> +            if (task == NULL || task->status != SCSI_STATUS_GOOD) {
> +                error_report("iSCSI: Failed to get_lba_status on iSCSI lun. %s",
> +                             iscsi_get_error(iscsilun->iscsi));
> +                ret = -EINVAL;
> +                goto out;
> +            }
> +
> +            lbas = scsi_datain_unmarshall(task);
> +            if (lbas == NULL) {
> +                error_report("iSCSI: failed to unmarshall inquiry datain blob");
> +                ret = -EINVAL;
> +                goto out;
> +            }
> +            lbasd = &lbas->descriptors[0];
> +            if (lbasd->lba != lba) {
> +                ret = -EINVAL;
> +                goto out;
> +            }
> +            nb_sectors = lbasd->num_blocks;
> +            provisioning = lbasd->provisioning;
> +            scsi_free_scsi_task(task);
> +            task = NULL;
> +
> +            /* blocks from lba to lba + nb_sectors - 1 are not mapped
> +             * and read as zero (lbprz==1) so we can skip them */
> +            if (provisioning != SCSI_PROVISIONING_TYPE_MAPPED) {
> +                lba += nb_sectors;
> +                continue;
> +            }
> +
> +            uint64_t lba2 = lba + nb_sectors;
> +            while (lba < lba2) {
> +                struct unmap_list list[1];
> +                list[0].lba = lba;
> +                list[0].num = iscsilun->max_unmap;
> +                if (lba + list[0].num > iscsilun->num_blocks) {
> +                    list[0].num = iscsilun->num_blocks - lba;
> +                }
> +                task = iscsi_unmap_sync(iscsilun->iscsi,
> +                                        iscsilun->lun,
> +                                        0, 0, &list[0], 1);
> +                if (task == NULL || task->status != SCSI_STATUS_GOOD) {
> +                    error_report("iSCSI: Failed to unmap data on iSCSI lun. %s",
> +                                 iscsi_get_error(iscsilun->iscsi));
> +                    ret = -EINVAL;
> +                    goto out;
> +                }
> +                scsi_free_scsi_task(task);
> +                task = NULL;
> +                lba += list[0].num;
> +            }
> +        }
> +    }
> +
>  out:
> +    if (task) {
> +        scsi_free_scsi_task(task);
> +    }
>      if (iscsilun->iscsi != NULL) {
>          iscsi_destroy_context(iscsilun->iscsi);
>      }
> 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 07/11] iscsi: let bdrv_create conditionally zero out the device
  2013-07-01 20:20   ` Paolo Bonzini
@ 2013-07-01 21:36     ` Peter Lieven
  2013-07-02  9:22       ` Paolo Bonzini
  0 siblings, 1 reply; 69+ messages in thread
From: Peter Lieven @ 2013-07-01 21:36 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, ronniesahlberg, qemu-devel, stefanha


Am 01.07.2013 um 22:20 schrieb Paolo Bonzini <pbonzini@redhat.com>:

> Il 27/06/2013 15:11, Peter Lieven ha scritto:
>> if the device supports unmapping and unmapped blocks read as
>> zero ensure that the whole device is unmapped and report
>> .has_zero_init = 1 in this case to speed up qemu-img convert.
> 
> This can still take several minutes.  Do you need any special timeout in
> libiscsi?

Not as far as I know. The number of unmapped sectors is bound by iscsilun->max_unmap.
This is what the storage will handle in one request. For my storage its the internal
page size (15MB). Its very fast. Except for some broken storages I expect that unmapping
on a thin-provisioned target means deleting of pointers or something similar not actually
writing something to the disks. This it what the SANITIZE command does.

Do you have evidence that it is really taking minutes somewhere?

I tested it with a fully allocated 1TB volume. In my case it was a question of about 20 seconds.
I think this was mainly due to the thousands of sync requests that had to be sent.

> 
> Perhaps we can have a new "discard_zeroes" field in bdrv_get_info, and
> the unmap functionality can be moved up to qemu-img convert?

Is there any other storage protocol out there that could benefit from it?

Peter

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 08/11] block-migration: efficiently encode zero blocks
  2013-07-01 16:09     ` Peter Lieven
@ 2013-07-02  7:36       ` Stefan Hajnoczi
  0 siblings, 0 replies; 69+ messages in thread
From: Stefan Hajnoczi @ 2013-07-02  7:36 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kwolf, pbonzini, ronniesahlberg, qemu-devel, Stefan Hajnoczi

On Mon, Jul 01, 2013 at 06:09:55PM +0200, Peter Lieven wrote:
> 
> Am 01.07.2013 um 16:13 schrieb Stefan Hajnoczi <stefanha@redhat.com>:
> 
> > On Thu, Jun 27, 2013 at 03:11:32PM +0200, Peter Lieven wrote:
> >> 
> >>     /* device name */
> >>     len = strlen(blk->bmds->bs->device_name);
> >>     qemu_put_byte(f, len);
> >>     qemu_put_buffer(f, (uint8_t *)blk->bmds->bs->device_name, len);
> >> 
> >> +    /* if a block is zero we need to flush here since the network
> >> +     * bandwidth is now a lot higher than the storage device bandwidth.
> >> +     * thus if we queue zero blocks we slow down the migration */
> >> +    if (flags & BLK_MIG_FLAG_ZERO_BLOCK) {
> >> +        qemu_fflush(f);
> >> +        return;
> >> +    }
> > 
> > Not sure I understand this.  Is the problem that the other side may
> > require an slow writev() to fill zeros?  So you want to tell the
> > destination about the zeroes ASAP.
> 
> Sorry, missed this question. Yes. If a lot of zero blocks is queued it delays
> migration because the target is idle while the source is queuing zero blocks.

Thanks.  I think this makes sense.

Stefan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 08/11] block-migration: efficiently encode zero blocks
  2013-07-01 15:55     ` Peter Lieven
@ 2013-07-02  7:40       ` Stefan Hajnoczi
  2013-07-02 10:51         ` Paolo Bonzini
  0 siblings, 1 reply; 69+ messages in thread
From: Stefan Hajnoczi @ 2013-07-02  7:40 UTC (permalink / raw)
  To: Peter Lieven
  Cc: kwolf, Stefan Hajnoczi, Juan Quintela, qemu-devel,
	ronniesahlberg, pbonzini

On Mon, Jul 01, 2013 at 05:55:15PM +0200, Peter Lieven wrote:
> 
> Am 01.07.2013 um 16:13 schrieb Stefan Hajnoczi <stefanha@redhat.com>:
> 
> > On Thu, Jun 27, 2013 at 03:11:32PM +0200, Peter Lieven wrote:
> > 
> > This patch breaks cross-version blog migration.  We need to control
> > whether or not to use the new BLK_MIG_FLAG_ZERO_BLOCK flag.
> 
> you are right the upgrade way works, but downgrade not. what is the
> proposed way to fix this?

The safest way is to add a block migration flag that toggles the zero
block.  By default it would be off.

Maybe you would also add a MigrationCapability in qapi-schema.json to
indicated that zero blocks are supported in block migration.  That would
allow management tools to ask QEMU on both sides if they support zero
blocks.

Use migrate-set-capabilities to enable the zero block feature.

CCed Juan Quintela for advice on migration.

Stefan

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 11/11] iscsi: assert that sectors are aligned to LUN blocksize
  2013-07-01 15:59     ` Peter Lieven
@ 2013-07-02  7:44       ` Stefan Hajnoczi
  2013-07-02  8:28         ` Peter Lieven
  0 siblings, 1 reply; 69+ messages in thread
From: Stefan Hajnoczi @ 2013-07-02  7:44 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kwolf, pbonzini, ronniesahlberg, qemu-devel, Stefan Hajnoczi

On Mon, Jul 01, 2013 at 05:59:02PM +0200, Peter Lieven wrote:
> 
> Am 01.07.2013 um 16:35 schrieb Stefan Hajnoczi <stefanha@redhat.com>:
> 
> > On Thu, Jun 27, 2013 at 03:11:35PM +0200, Peter Lieven wrote:
> >> if the blocksize of an iSCSI LUN is bigger than the BDRV_SECTOR_SIZE
> >> it is possible that sector_num or nb_sectors are not correctly
> >> alligned.
> >> 
> >> to avoid corruption we fail requests which are misaligned.
> >> 
> >> Signed-off-by: Peter Lieven <pl@kamp.de>
> >> ---
> >> block/iscsi.c |   27 +++++++++++++++++++++++++++
> >> 1 file changed, 27 insertions(+)
> >> 
> >> diff --git a/block/iscsi.c b/block/iscsi.c
> >> index 0567b46..bff2e1f 100644
> >> --- a/block/iscsi.c
> >> +++ b/block/iscsi.c
> >> @@ -298,6 +298,13 @@ static int64_t sector_lun2qemu(int64_t sector, IscsiLun *iscsilun)
> >>     return sector * iscsilun->block_size / BDRV_SECTOR_SIZE;
> >> }
> >> 
> >> +static int64_t is_request_lun_aligned(int64_t sector_num, int nb_sectors,
> >> +                                      IscsiLun *iscsilun)
> >> +{
> >> +    return ((sector_num * BDRV_SECTOR_SIZE) % iscsilun->block_size ||
> >> +            (nb_sectors * BDRV_SECTOR_SIZE) % iscsilun->block_size) ? 0 : 1;
> >> +}
> > 
> > When QEMU does geometry probing it reads 2KB.  So if the LUN has 4KB
> > block size then QEMU will fail to open it?  This would also affect image
> > formats on top of iSCSI LUNs.
> 
> opening a 4K LUN does not fail with my target. So writing unaligned sectors will
> result in corruption. we should at least fail all those unaligned operations until
> we have a fix or workaround in place in place.

When QEMU tries to probe the master boot record or when an image format
performs an unaligned read (say 512 bytes instead of 4 KB), won't we
fail the read request?  Therefore opening will fail.

I agree that it's better to emit an error than to corrupt the disk.  I'm
just trying to understand what needs to be done on top of this to make 4
KB LUNs work.

> > AFAICT we have no way to passing I/O topology information up from the
> > block driver.
> 
> not at the moment. Paolo had a patch series back in Dec 2011, but it never went
> upstream. I asked him off list and he told me that 4K drives where not important
> enough and the Redhat bug for this was closed. Now with 4K iSCSI targets these
> old work could become important again.
> 
> It would be nice to find out if there was anything wrong with them or what has to
> be done to get them integrated.

Okay

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 11/11] iscsi: assert that sectors are aligned to LUN blocksize
  2013-07-02  7:44       ` Stefan Hajnoczi
@ 2013-07-02  8:28         ` Peter Lieven
  2013-07-02 10:44           ` Paolo Bonzini
  0 siblings, 1 reply; 69+ messages in thread
From: Peter Lieven @ 2013-07-02  8:28 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kwolf, pbonzini, ronniesahlberg, qemu-devel, Stefan Hajnoczi


Am 02.07.2013 um 09:44 schrieb Stefan Hajnoczi <stefanha@gmail.com>:

> On Mon, Jul 01, 2013 at 05:59:02PM +0200, Peter Lieven wrote:
>> 
>> Am 01.07.2013 um 16:35 schrieb Stefan Hajnoczi <stefanha@redhat.com>:
>> 
>>> On Thu, Jun 27, 2013 at 03:11:35PM +0200, Peter Lieven wrote:
>>>> if the blocksize of an iSCSI LUN is bigger than the BDRV_SECTOR_SIZE
>>>> it is possible that sector_num or nb_sectors are not correctly
>>>> alligned.
>>>> 
>>>> to avoid corruption we fail requests which are misaligned.
>>>> 
>>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>>>> ---
>>>> block/iscsi.c |   27 +++++++++++++++++++++++++++
>>>> 1 file changed, 27 insertions(+)
>>>> 
>>>> diff --git a/block/iscsi.c b/block/iscsi.c
>>>> index 0567b46..bff2e1f 100644
>>>> --- a/block/iscsi.c
>>>> +++ b/block/iscsi.c
>>>> @@ -298,6 +298,13 @@ static int64_t sector_lun2qemu(int64_t sector, IscsiLun *iscsilun)
>>>>    return sector * iscsilun->block_size / BDRV_SECTOR_SIZE;
>>>> }
>>>> 
>>>> +static int64_t is_request_lun_aligned(int64_t sector_num, int nb_sectors,
>>>> +                                      IscsiLun *iscsilun)
>>>> +{
>>>> +    return ((sector_num * BDRV_SECTOR_SIZE) % iscsilun->block_size ||
>>>> +            (nb_sectors * BDRV_SECTOR_SIZE) % iscsilun->block_size) ? 0 : 1;
>>>> +}
>>> 
>>> When QEMU does geometry probing it reads 2KB.  So if the LUN has 4KB
>>> block size then QEMU will fail to open it?  This would also affect image
>>> formats on top of iSCSI LUNs.
>> 
>> opening a 4K LUN does not fail with my target. So writing unaligned sectors will
>> result in corruption. we should at least fail all those unaligned operations until
>> we have a fix or workaround in place in place.
> 
> When QEMU tries to probe the master boot record or when an image format
> performs an unaligned read (say 512 bytes instead of 4 KB), won't we
> fail the read request?  Therefore opening will fail.

I can speak for iSCSI. Currently it fails when opening a boot device. I think there
is no MBR probing done for non-boot devices.

Ronnie added some magic to iscsi_aio_readv to have an unaligned number for nb_sectors,
but this seems not to work anymore (I will add a patch to remove it). But as soon
as nb_sectors is aligned the sector_num will be rounded down and its not exactly
doing what it should. Writing will succeed but with the wrong offset. UNMAP will
always work.

> 
> I agree that it's better to emit an error than to corrupt the disk.  I'm
> just trying to understand what needs to be done on top of this to make 4
> KB LUNs work.

Reading is quite easy, you just have to have a wrapper that is reading an aligned
portion of sectors around the original request and then extracting what was requested.
Writing is very complicated. Requests have to be serialized, Read-Modify-Write for
unaligned write requests. Paolo had all this prepared already.

I wonder if it would be enough to have the block size of the host/iSCSI device propagated
to the guest drivers and read 4K (or the protocol block size bytes) when probing the MBR.

Windows 8, Windows 2012 and recent LInux support 4kN. So they should
handle all what its needed internally. What Paolo tried to integrate is sth like 512e support
for 4kN drives in qemu. What I have read in fact most of the advanced format drives do 512e
already. So the only need left is for iSCSI, but I would be totally fine to say if you want to attach
a 4kN device to qemu run an operating system that supports it.

Peter

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 07/11] iscsi: let bdrv_create conditionally zero out the device
  2013-07-01 21:36     ` Peter Lieven
@ 2013-07-02  9:22       ` Paolo Bonzini
  2013-07-02 10:36         ` Peter Lieven
  0 siblings, 1 reply; 69+ messages in thread
From: Paolo Bonzini @ 2013-07-02  9:22 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kwolf, ronniesahlberg, qemu-devel, stefanha

Il 01/07/2013 23:36, Peter Lieven ha scritto:
> 
> Am 01.07.2013 um 22:20 schrieb Paolo Bonzini <pbonzini@redhat.com>:
> 
>> Il 27/06/2013 15:11, Peter Lieven ha scritto:
>>> if the device supports unmapping and unmapped blocks read as
>>> zero ensure that the whole device is unmapped and report
>>> .has_zero_init = 1 in this case to speed up qemu-img convert.
>>
>> This can still take several minutes.  Do you need any special timeout in
>> libiscsi?
> 
> Not as far as I know. The number of unmapped sectors is bound by iscsilun->max_unmap.
> This is what the storage will handle in one request. For my storage its the internal
> page size (15MB). Its very fast. Except for some broken storages I expect that unmapping
> on a thin-provisioned target means deleting of pointers or something similar not actually
> writing something to the disks. This it what the SANITIZE command does.

Ok, I remember someone reporting timeouts when doing a discard with
virtio-scsi.  But maybe the problem there was that Linux parallelized
the requests excessively after splitting them.

> Do you have evidence that it is really taking minutes somewhere?
> 
> I tested it with a fully allocated 1TB volume. In my case it was a question of about 20 seconds.
> I think this was mainly due to the thousands of sync requests that had to be sent.
> 
>> Perhaps we can have a new "discard_zeroes" field in bdrv_get_info, and
>> the unmap functionality can be moved up to qemu-img convert?
> 
> Is there any other storage protocol out there that could benefit from it?

Definitely LVM.  Perhaps in the future gluster too, though right now it
only supports discard on files, not block devices.

Paolo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 07/11] iscsi: let bdrv_create conditionally zero out the device
  2013-07-02  9:22       ` Paolo Bonzini
@ 2013-07-02 10:36         ` Peter Lieven
  2013-07-02 10:49           ` Paolo Bonzini
  0 siblings, 1 reply; 69+ messages in thread
From: Peter Lieven @ 2013-07-02 10:36 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, ronniesahlberg, qemu-devel, stefanha


Am 02.07.2013 um 11:22 schrieb Paolo Bonzini <pbonzini@redhat.com>:

> Il 01/07/2013 23:36, Peter Lieven ha scritto:
>> 
>> Am 01.07.2013 um 22:20 schrieb Paolo Bonzini <pbonzini@redhat.com>:
>> 
>>> Il 27/06/2013 15:11, Peter Lieven ha scritto:
>>>> if the device supports unmapping and unmapped blocks read as
>>>> zero ensure that the whole device is unmapped and report
>>>> .has_zero_init = 1 in this case to speed up qemu-img convert.
>>> 
>>> This can still take several minutes.  Do you need any special timeout in
>>> libiscsi?
>> 
>> Not as far as I know. The number of unmapped sectors is bound by iscsilun->max_unmap.
>> This is what the storage will handle in one request. For my storage its the internal
>> page size (15MB). Its very fast. Except for some broken storages I expect that unmapping
>> on a thin-provisioned target means deleting of pointers or something similar not actually
>> writing something to the disks. This it what the SANITIZE command does.
> 
> Ok, I remember someone reporting timeouts when doing a discard with
> virtio-scsi.  But maybe the problem there was that Linux parallelized
> the requests excessively after splitting them.
> 
>> Do you have evidence that it is really taking minutes somewhere?
>> 
>> I tested it with a fully allocated 1TB volume. In my case it was a question of about 20 seconds.
>> I think this was mainly due to the thousands of sync requests that had to be sent.
>> 
>>> Perhaps we can have a new "discard_zeroes" field in bdrv_get_info, and
>>> the unmap functionality can be moved up to qemu-img convert?
>> 
>> Is there any other storage protocol out there that could benefit from it?
> 
> Definitely LVM.  Perhaps in the future gluster too, though right now it
> only supports discard on files, not block devices.

Is discards on LVM sth that is already implemented in qemu?

Would you mind if we postpone the general approach to a later point.
It seems that this is much more complex than the iSCSI approach.
My intention was to fix the iSCSI thin-provisioning problem here.

Peter

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 11/11] iscsi: assert that sectors are aligned to LUN blocksize
  2013-07-02  8:28         ` Peter Lieven
@ 2013-07-02 10:44           ` Paolo Bonzini
  2013-07-02 10:49             ` Peter Lieven
  0 siblings, 1 reply; 69+ messages in thread
From: Paolo Bonzini @ 2013-07-02 10:44 UTC (permalink / raw)
  To: Peter Lieven
  Cc: kwolf, Stefan Hajnoczi, ronniesahlberg, qemu-devel, Stefan Hajnoczi

Il 02/07/2013 10:28, Peter Lieven ha scritto:
>> I agree that it's better to emit an error than to corrupt the disk.  I'm
>> just trying to understand what needs to be done on top of this to make 4
>> KB LUNs work.
> 
> Reading is quite easy, you just have to have a wrapper that is reading an aligned
> portion of sectors around the original request and then extracting what was requested.
> Writing is very complicated. Requests have to be serialized, Read-Modify-Write for
> unaligned write requests. Paolo had all this prepared already.
> 
> I wonder if it would be enough to have the block size of the host/iSCSI device propagated
> to the guest drivers and read 4K (or the protocol block size bytes) when probing the MBR.

No, propagating the size of the device is only correct if we're doing
SCSI passthrough (-device scsi-block) and in that case we're doing it
already.

For emulation, the choices are:

1) manually set the logical_block_size of the disk to 4096 (which means
the disk cannot be used as a boot disk, at least with BIOS; I don't know
about UEFI);

2) do 512e/4KN as in my patches.  You would still need to expose a 4K
physical_sector_size to be precise, but it is just an optimization.

Paolo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 07/11] iscsi: let bdrv_create conditionally zero out the device
  2013-07-02 10:36         ` Peter Lieven
@ 2013-07-02 10:49           ` Paolo Bonzini
  2013-07-02 10:56             ` Peter Lieven
  0 siblings, 1 reply; 69+ messages in thread
From: Paolo Bonzini @ 2013-07-02 10:49 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kwolf, ronniesahlberg, qemu-devel, stefanha

Il 02/07/2013 12:36, Peter Lieven ha scritto:
>>>> Perhaps we can have a new "discard_zeroes" field in bdrv_get_info, and
>>>> the unmap functionality can be moved up to qemu-img convert?
>>>
>>> Is there any other storage protocol out there that could benefit from it?
>>
>> Definitely LVM.  Perhaps in the future gluster too, though right now it
>> only supports discard on files, not block devices.
> 
> Is discards on LVM sth that is already implemented in qemu?

Yes, it supports BLKDISCARD (see handle_aiocb_discard in
block/raw-posix.c).  Of course there is no way to query the host
discard_zeroes setting yet.

But even if it weren't implemented in QEMU, you should aim at making it
easier (if it's not too much work, which it isn't), not harder.  If you
do it in block/iscsi.c, the next person who comes will have to basically
undo your work and reimplement+retest it with the right API.

> Would you mind if we postpone the general approach to a later point.
> It seems that this is much more complex than the iSCSI approach.

It shouldn't be more complex at all, actually.  You just need to pass
the maximum unmap sectors and lbprz parameters through bdrv_get_info.

I'm not asking you to add support for BLKDISCARDZEROES and all that.
I'm asking you to do the work at the right level.

Paolo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 11/11] iscsi: assert that sectors are aligned to LUN blocksize
  2013-07-02 10:44           ` Paolo Bonzini
@ 2013-07-02 10:49             ` Peter Lieven
  2013-07-02 10:53               ` Paolo Bonzini
  0 siblings, 1 reply; 69+ messages in thread
From: Peter Lieven @ 2013-07-02 10:49 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: kwolf, Stefan Hajnoczi, ronniesahlberg, qemu-devel, Stefan Hajnoczi


Am 02.07.2013 um 12:44 schrieb Paolo Bonzini <pbonzini@redhat.com>:

> Il 02/07/2013 10:28, Peter Lieven ha scritto:
>>> I agree that it's better to emit an error than to corrupt the disk.  I'm
>>> just trying to understand what needs to be done on top of this to make 4
>>> KB LUNs work.
>> 
>> Reading is quite easy, you just have to have a wrapper that is reading an aligned
>> portion of sectors around the original request and then extracting what was requested.
>> Writing is very complicated. Requests have to be serialized, Read-Modify-Write for
>> unaligned write requests. Paolo had all this prepared already.
>> 
>> I wonder if it would be enough to have the block size of the host/iSCSI device propagated
>> to the guest drivers and read 4K (or the protocol block size bytes) when probing the MBR.
> 
> No, propagating the size of the device is only correct if we're doing
> SCSI passthrough (-device scsi-block) and in that case we're doing it
> already.
> 
> For emulation, the choices are:
> 
> 1) manually set the logical_block_size of the disk to 4096 (which means
> the disk cannot be used as a boot disk, at least with BIOS; I don't know
> about UEFI);
> 
> 2) do 512e/4KN as in my patches.  You would still need to expose a 4K
> physical_sector_size to be precise, but it is just an optimization.

This seems the more reasonable for me. Can you check what needs to be done
to get your last version of your old series ready for upstream?

Is the exposing of a sector size > 512 byte complicated or is this something
that is already possible with IDE, Virtio-BLK, Virtio-SCSI etc.?

Peter

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 08/11] block-migration: efficiently encode zero blocks
  2013-07-02  7:40       ` Stefan Hajnoczi
@ 2013-07-02 10:51         ` Paolo Bonzini
  0 siblings, 0 replies; 69+ messages in thread
From: Paolo Bonzini @ 2013-07-02 10:51 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kwolf, Stefan Hajnoczi, Juan Quintela, Peter Lieven, qemu-devel,
	ronniesahlberg

Il 02/07/2013 09:40, Stefan Hajnoczi ha scritto:
> On Mon, Jul 01, 2013 at 05:55:15PM +0200, Peter Lieven wrote:
>>
>> Am 01.07.2013 um 16:13 schrieb Stefan Hajnoczi <stefanha@redhat.com>:
>>
>>> On Thu, Jun 27, 2013 at 03:11:32PM +0200, Peter Lieven wrote:
>>>
>>> This patch breaks cross-version blog migration.  We need to control
>>> whether or not to use the new BLK_MIG_FLAG_ZERO_BLOCK flag.
>>
>> you are right the upgrade way works, but downgrade not. what is the
>> proposed way to fix this?
> 
> The safest way is to add a block migration flag that toggles the zero
> block.  By default it would be off.
> 
> Maybe you would also add a MigrationCapability in qapi-schema.json to
> indicated that zero blocks are supported in block migration.  That would
> allow management tools to ask QEMU on both sides if they support zero
> blocks.
> 
> Use migrate-set-capabilities to enable the zero block feature.

Yes, a capability makes sense.

Paolo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 11/11] iscsi: assert that sectors are aligned to LUN blocksize
  2013-07-02 10:49             ` Peter Lieven
@ 2013-07-02 10:53               ` Paolo Bonzini
  0 siblings, 0 replies; 69+ messages in thread
From: Paolo Bonzini @ 2013-07-02 10:53 UTC (permalink / raw)
  To: Peter Lieven
  Cc: kwolf, Stefan Hajnoczi, ronniesahlberg, qemu-devel, Stefan Hajnoczi

Il 02/07/2013 12:49, Peter Lieven ha scritto:
> 
> Am 02.07.2013 um 12:44 schrieb Paolo Bonzini <pbonzini@redhat.com>:
> 
>> Il 02/07/2013 10:28, Peter Lieven ha scritto:
>>>> I agree that it's better to emit an error than to corrupt the disk.  I'm
>>>> just trying to understand what needs to be done on top of this to make 4
>>>> KB LUNs work.
>>>
>>> Reading is quite easy, you just have to have a wrapper that is reading an aligned
>>> portion of sectors around the original request and then extracting what was requested.
>>> Writing is very complicated. Requests have to be serialized, Read-Modify-Write for
>>> unaligned write requests. Paolo had all this prepared already.
>>>
>>> I wonder if it would be enough to have the block size of the host/iSCSI device propagated
>>> to the guest drivers and read 4K (or the protocol block size bytes) when probing the MBR.
>>
>> No, propagating the size of the device is only correct if we're doing
>> SCSI passthrough (-device scsi-block) and in that case we're doing it
>> already.
>>
>> For emulation, the choices are:
>>
>> 1) manually set the logical_block_size of the disk to 4096 (which means
>> the disk cannot be used as a boot disk, at least with BIOS; I don't know
>> about UEFI);
>>
>> 2) do 512e/4KN as in my patches.  You would still need to expose a 4K
>> physical_sector_size to be precise, but it is just an optimization.
> 
> This seems the more reasonable for me. Can you check what needs to be done
> to get your last version of your old series ready for upstream?
> 
> Is the exposing of a sector size > 512 byte complicated or is this something
> that is already possible with IDE, Virtio-BLK, Virtio-SCSI etc.?

It is possible with virtio-blk and SCSI with the logical_block_size
property of virtio-blk-*, scsi-hd and scsi-disk devices.  IIRC libvirt
supports it, too.

IDE only supports 512e/4KN (in general, not just in QEMU).

Paolo

> Peter
> 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 07/11] iscsi: let bdrv_create conditionally zero out the device
  2013-07-02 10:49           ` Paolo Bonzini
@ 2013-07-02 10:56             ` Peter Lieven
  2013-07-02 11:04               ` Paolo Bonzini
  0 siblings, 1 reply; 69+ messages in thread
From: Peter Lieven @ 2013-07-02 10:56 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, ronniesahlberg, qemu-devel, stefanha


Am 02.07.2013 um 12:49 schrieb Paolo Bonzini <pbonzini@redhat.com>:

> Il 02/07/2013 12:36, Peter Lieven ha scritto:
>>>>> Perhaps we can have a new "discard_zeroes" field in bdrv_get_info, and
>>>>> the unmap functionality can be moved up to qemu-img convert?
>>>> 
>>>> Is there any other storage protocol out there that could benefit from it?
>>> 
>>> Definitely LVM.  Perhaps in the future gluster too, though right now it
>>> only supports discard on files, not block devices.
>> 
>> Is discards on LVM sth that is already implemented in qemu?
> 
> Yes, it supports BLKDISCARD (see handle_aiocb_discard in
> block/raw-posix.c).  Of course there is no way to query the host
> discard_zeroes setting yet.

No way in qemu or no way at all?

> 
> But even if it weren't implemented in QEMU, you should aim at making it
> easier (if it's not too much work, which it isn't), not harder.  If you
> do it in block/iscsi.c, the next person who comes will have to basically
> undo your work and reimplement+retest it with the right API.
> 
>> Would you mind if we postpone the general approach to a later point.
>> It seems that this is much more complex than the iSCSI approach.
> 
> It shouldn't be more complex at all, actually.  You just need to pass
> the maximum unmap sectors and lbprz parameters through bdrv_get_info.
> 
> I'm not asking you to add support for BLKDISCARDZEROES and all that.
> I'm asking you to do the work at the right level.

You are right, if its possible in other protocols also it should be done at a higher level.
I will at least look into integrating it into bdrv_get_info for iSCSI and implement the
unmapping logic in qemu-img for the case that has_zero_init is 0.

Peter

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 07/11] iscsi: let bdrv_create conditionally zero out the device
  2013-07-02 10:56             ` Peter Lieven
@ 2013-07-02 11:04               ` Paolo Bonzini
  2013-07-02 11:18                 ` Peter Lieven
  0 siblings, 1 reply; 69+ messages in thread
From: Paolo Bonzini @ 2013-07-02 11:04 UTC (permalink / raw)
  To: Peter Lieven; +Cc: kwolf, ronniesahlberg, qemu-devel, stefanha

Il 02/07/2013 12:56, Peter Lieven ha scritto:
> 
> Am 02.07.2013 um 12:49 schrieb Paolo Bonzini <pbonzini@redhat.com>:
> 
>> Il 02/07/2013 12:36, Peter Lieven ha scritto:
>>>>>> Perhaps we can have a new "discard_zeroes" field in bdrv_get_info, and
>>>>>> the unmap functionality can be moved up to qemu-img convert?
>>>>>
>>>>> Is there any other storage protocol out there that could benefit from it?
>>>>
>>>> Definitely LVM.  Perhaps in the future gluster too, though right now it
>>>> only supports discard on files, not block devices.
>>>
>>> Is discards on LVM sth that is already implemented in qemu?
>>
>> Yes, it supports BLKDISCARD (see handle_aiocb_discard in
>> block/raw-posix.c).  Of course there is no way to query the host
>> discard_zeroes setting yet.
> 
> No way in qemu or no way at all?

No way in QEMU.  Linux has the BLKDISCARDZEROES ioctl, it takes an int*
as the third argument and puts 0/1 in it.

Paolo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 07/11] iscsi: let bdrv_create conditionally zero out the device
  2013-07-02 11:04               ` Paolo Bonzini
@ 2013-07-02 11:18                 ` Peter Lieven
  0 siblings, 0 replies; 69+ messages in thread
From: Peter Lieven @ 2013-07-02 11:18 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: kwolf, ronniesahlberg, qemu-devel, stefanha


Am 02.07.2013 um 13:04 schrieb Paolo Bonzini <pbonzini@redhat.com>:

> Il 02/07/2013 12:56, Peter Lieven ha scritto:
>> 
>> Am 02.07.2013 um 12:49 schrieb Paolo Bonzini <pbonzini@redhat.com>:
>> 
>>> Il 02/07/2013 12:36, Peter Lieven ha scritto:
>>>>>>> Perhaps we can have a new "discard_zeroes" field in bdrv_get_info, and
>>>>>>> the unmap functionality can be moved up to qemu-img convert?
>>>>>> 
>>>>>> Is there any other storage protocol out there that could benefit from it?
>>>>> 
>>>>> Definitely LVM.  Perhaps in the future gluster too, though right now it
>>>>> only supports discard on files, not block devices.
>>>> 
>>>> Is discards on LVM sth that is already implemented in qemu?
>>> 
>>> Yes, it supports BLKDISCARD (see handle_aiocb_discard in
>>> block/raw-posix.c).  Of course there is no way to query the host
>>> discard_zeroes setting yet.
>> 
>> No way in qemu or no way at all?
> 
> No way in QEMU.  Linux has the BLKDISCARDZEROES ioctl, it takes an int*
> as the third argument and puts 0/1 in it.

okay, thanks for the pointer.

if you all don't mind I would split up my series and first send a v3 which adds the unproblematic stuff and
leave 

iscsi: let bdrv_create conditionally zero out the device
block-migration: efficiently encode zero blocks

out. both seem to need more time and cycles to follow the better approaches.

Peter

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 02/11] iscsi: read unmap info from block limits vpd page
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 02/11] iscsi: read unmap info from block limits vpd page Peter Lieven
@ 2013-07-03  3:43   ` ronnie sahlberg
  2013-07-03 21:23     ` Peter Lieven
  2013-07-10  9:23   ` Kevin Wolf
  2013-07-10  9:25   ` Kevin Wolf
  2 siblings, 1 reply; 69+ messages in thread
From: ronnie sahlberg @ 2013-07-03  3:43 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi

max_unmap :

If the target does not return an explicit limit for max_unmap it will
return 0xffffffff which means "no limit".

I think you should add a check for max_unmap and clamp it down to
something sane.
Maybe a maximum of 128M ?

Same for bdc, that should also be checked and clamped down to sane values.


On Thu, Jun 27, 2013 at 11:11 PM, Peter Lieven <pl@kamp.de> wrote:
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  block/iscsi.c |   80 ++++++++++++++++++++++++++++++++++++++++-----------------
>  1 file changed, 56 insertions(+), 24 deletions(-)
>
> diff --git a/block/iscsi.c b/block/iscsi.c
> index a38a1bf..2e2455d 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -54,6 +54,8 @@ typedef struct IscsiLun {
>      uint8_t lbpu;
>      uint8_t lbpws;
>      uint8_t lbpws10;
> +    uint32_t max_unmap;
> +    uint32_t max_unmap_bdc;
>  } IscsiLun;
>
>  typedef struct IscsiAIOCB {
> @@ -1007,6 +1009,37 @@ static QemuOptsList runtime_opts = {
>      },
>  };
>
> +static struct scsi_task *iscsi_do_inquiry(struct iscsi_context *iscsi,
> +                                          int lun, int evpd, int pc) {
> +        int full_size;
> +        struct scsi_task *task = NULL;
> +        task = iscsi_inquiry_sync(iscsi, lun, evpd, pc, 64);
> +        if (task == NULL || task->status != SCSI_STATUS_GOOD) {
> +            goto fail;
> +        }
> +        full_size = scsi_datain_getfullsize(task);
> +        if (full_size > task->datain.size) {
> +            scsi_free_scsi_task(task);
> +
> +            /* we need more data for the full list */
> +            task = iscsi_inquiry_sync(iscsi, lun, evpd, pc, full_size);
> +            if (task == NULL || task->status != SCSI_STATUS_GOOD) {
> +                goto fail;
> +            }
> +        }
> +
> +        return task;
> +
> +fail:
> +        error_report("iSCSI: Inquiry command failed : %s",
> +                     iscsi_get_error(iscsi));
> +        if (task) {
> +            scsi_free_scsi_task(task);
> +            return NULL;
> +        }
> +        return NULL;
> +}
> +
>  /*
>   * We support iscsi url's on the form
>   * iscsi://[<username>%<password>@]<host>[:<port>]/<targetname>/<lun>
> @@ -1139,33 +1172,12 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags)
>
>      if (iscsilun->lbpme) {
>          struct scsi_inquiry_logical_block_provisioning *inq_lbp;
> -        int full_size;
> -
> -        task = iscsi_inquiry_sync(iscsi, iscsilun->lun, 1,
> -                                  SCSI_INQUIRY_PAGECODE_LOGICAL_BLOCK_PROVISIONING,
> -                                  64);
> -        if (task == NULL || task->status != SCSI_STATUS_GOOD) {
> -            error_report("iSCSI: Inquiry command failed : %s",
> -                   iscsi_get_error(iscsilun->iscsi));
> +        task = iscsi_do_inquiry(iscsilun->iscsi, iscsilun->lun, 1,
> +                                SCSI_INQUIRY_PAGECODE_LOGICAL_BLOCK_PROVISIONING);
> +        if (task == NULL) {
>              ret = -EINVAL;
>              goto out;
>          }
> -        full_size = scsi_datain_getfullsize(task);
> -        if (full_size > task->datain.size) {
> -            scsi_free_scsi_task(task);
> -
> -            /* we need more data for the full list */
> -            task = iscsi_inquiry_sync(iscsi, iscsilun->lun, 1,
> -                                      SCSI_INQUIRY_PAGECODE_LOGICAL_BLOCK_PROVISIONING,
> -                                      full_size);
> -            if (task == NULL || task->status != SCSI_STATUS_GOOD) {
> -                error_report("iSCSI: Inquiry command failed : %s",
> -                             iscsi_get_error(iscsilun->iscsi));
> -                ret = -EINVAL;
> -                goto out;
> -            }
> -        }
> -
>          inq_lbp = scsi_datain_unmarshall(task);
>          if (inq_lbp == NULL) {
>              error_report("iSCSI: failed to unmarshall inquiry datain blob");
> @@ -1175,6 +1187,26 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags)
>          iscsilun->lbpu = inq_lbp->lbpu;
>          iscsilun->lbpws = inq_lbp->lbpws;
>          iscsilun->lbpws10 = inq_lbp->lbpws10;
> +        scsi_free_scsi_task(task);
> +        task = NULL;
> +    }
> +
> +    if (iscsilun->lbpu) {
> +        struct scsi_inquiry_block_limits *inq_bl;
> +        task = iscsi_do_inquiry(iscsilun->iscsi, iscsilun->lun, 1,
> +                                SCSI_INQUIRY_PAGECODE_BLOCK_LIMITS);
> +        if (task == NULL) {
> +            ret = -EINVAL;
> +            goto out;
> +        }
> +        inq_bl = scsi_datain_unmarshall(task);
> +        if (inq_bl == NULL) {
> +            error_report("iSCSI: failed to unmarshall inquiry datain blob");
> +            ret = -EINVAL;
> +            goto out;
> +        }
> +        iscsilun->max_unmap = inq_bl->max_unmap;
> +        iscsilun->max_unmap_bdc = inq_bl->max_unmap_bdc;
>      }
>
>  #if defined(LIBISCSI_FEATURE_NOP_COUNTER)
> --
> 1.7.9.5
>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 02/11] iscsi: read unmap info from block limits vpd page
  2013-07-03  3:43   ` ronnie sahlberg
@ 2013-07-03 21:23     ` Peter Lieven
  2013-07-04 12:37       ` Paolo Bonzini
  0 siblings, 1 reply; 69+ messages in thread
From: Peter Lieven @ 2013-07-03 21:23 UTC (permalink / raw)
  To: ronnie sahlberg; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi


Am 03.07.2013 um 05:43 schrieb ronnie sahlberg <ronniesahlberg@gmail.com>:

> max_unmap :
> 
> If the target does not return an explicit limit for max_unmap it will
> return 0xffffffff which means "no limit".
> 

thanks for the remark. i wasn't aware.

> I think you should add a check for max_unmap and clamp it down to
> something sane.
> Maybe a maximum of 128M ?

okay, i personally would tent to less (32MB or 64MB).

> 
> Same for bdc, that should also be checked and clamped down to sane values.

BDC is not used. I had an implementation that sent multiple descriptors out, but
at least for my storage the maximum unmap counts not for each descriptors, but for all
together. So in this case we do not need the field at all. I forgot to remove it.

discard and write_zeroes will both only send one request up to max_unmap in size.

apropos write_zeroes: do you know if UNMAP is guaranteed to unmap data if lbprz == 1?

I have read in the specs something that the target might unmap the blocks or not touch them at all.
Maybe you have more information.

Peter




> 
> 
> On Thu, Jun 27, 2013 at 11:11 PM, Peter Lieven <pl@kamp.de> wrote:
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>> block/iscsi.c |   80 ++++++++++++++++++++++++++++++++++++++++-----------------
>> 1 file changed, 56 insertions(+), 24 deletions(-)
>> 
>> diff --git a/block/iscsi.c b/block/iscsi.c
>> index a38a1bf..2e2455d 100644
>> --- a/block/iscsi.c
>> +++ b/block/iscsi.c
>> @@ -54,6 +54,8 @@ typedef struct IscsiLun {
>>     uint8_t lbpu;
>>     uint8_t lbpws;
>>     uint8_t lbpws10;
>> +    uint32_t max_unmap;
>> +    uint32_t max_unmap_bdc;
>> } IscsiLun;
>> 
>> typedef struct IscsiAIOCB {
>> @@ -1007,6 +1009,37 @@ static QemuOptsList runtime_opts = {
>>     },
>> };
>> 
>> +static struct scsi_task *iscsi_do_inquiry(struct iscsi_context *iscsi,
>> +                                          int lun, int evpd, int pc) {
>> +        int full_size;
>> +        struct scsi_task *task = NULL;
>> +        task = iscsi_inquiry_sync(iscsi, lun, evpd, pc, 64);
>> +        if (task == NULL || task->status != SCSI_STATUS_GOOD) {
>> +            goto fail;
>> +        }
>> +        full_size = scsi_datain_getfullsize(task);
>> +        if (full_size > task->datain.size) {
>> +            scsi_free_scsi_task(task);
>> +
>> +            /* we need more data for the full list */
>> +            task = iscsi_inquiry_sync(iscsi, lun, evpd, pc, full_size);
>> +            if (task == NULL || task->status != SCSI_STATUS_GOOD) {
>> +                goto fail;
>> +            }
>> +        }
>> +
>> +        return task;
>> +
>> +fail:
>> +        error_report("iSCSI: Inquiry command failed : %s",
>> +                     iscsi_get_error(iscsi));
>> +        if (task) {
>> +            scsi_free_scsi_task(task);
>> +            return NULL;
>> +        }
>> +        return NULL;
>> +}
>> +
>> /*
>>  * We support iscsi url's on the form
>>  * iscsi://[<username>%<password>@]<host>[:<port>]/<targetname>/<lun>
>> @@ -1139,33 +1172,12 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags)
>> 
>>     if (iscsilun->lbpme) {
>>         struct scsi_inquiry_logical_block_provisioning *inq_lbp;
>> -        int full_size;
>> -
>> -        task = iscsi_inquiry_sync(iscsi, iscsilun->lun, 1,
>> -                                  SCSI_INQUIRY_PAGECODE_LOGICAL_BLOCK_PROVISIONING,
>> -                                  64);
>> -        if (task == NULL || task->status != SCSI_STATUS_GOOD) {
>> -            error_report("iSCSI: Inquiry command failed : %s",
>> -                   iscsi_get_error(iscsilun->iscsi));
>> +        task = iscsi_do_inquiry(iscsilun->iscsi, iscsilun->lun, 1,
>> +                                SCSI_INQUIRY_PAGECODE_LOGICAL_BLOCK_PROVISIONING);
>> +        if (task == NULL) {
>>             ret = -EINVAL;
>>             goto out;
>>         }
>> -        full_size = scsi_datain_getfullsize(task);
>> -        if (full_size > task->datain.size) {
>> -            scsi_free_scsi_task(task);
>> -
>> -            /* we need more data for the full list */
>> -            task = iscsi_inquiry_sync(iscsi, iscsilun->lun, 1,
>> -                                      SCSI_INQUIRY_PAGECODE_LOGICAL_BLOCK_PROVISIONING,
>> -                                      full_size);
>> -            if (task == NULL || task->status != SCSI_STATUS_GOOD) {
>> -                error_report("iSCSI: Inquiry command failed : %s",
>> -                             iscsi_get_error(iscsilun->iscsi));
>> -                ret = -EINVAL;
>> -                goto out;
>> -            }
>> -        }
>> -
>>         inq_lbp = scsi_datain_unmarshall(task);
>>         if (inq_lbp == NULL) {
>>             error_report("iSCSI: failed to unmarshall inquiry datain blob");
>> @@ -1175,6 +1187,26 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags)
>>         iscsilun->lbpu = inq_lbp->lbpu;
>>         iscsilun->lbpws = inq_lbp->lbpws;
>>         iscsilun->lbpws10 = inq_lbp->lbpws10;
>> +        scsi_free_scsi_task(task);
>> +        task = NULL;
>> +    }
>> +
>> +    if (iscsilun->lbpu) {
>> +        struct scsi_inquiry_block_limits *inq_bl;
>> +        task = iscsi_do_inquiry(iscsilun->iscsi, iscsilun->lun, 1,
>> +                                SCSI_INQUIRY_PAGECODE_BLOCK_LIMITS);
>> +        if (task == NULL) {
>> +            ret = -EINVAL;
>> +            goto out;
>> +        }
>> +        inq_bl = scsi_datain_unmarshall(task);
>> +        if (inq_bl == NULL) {
>> +            error_report("iSCSI: failed to unmarshall inquiry datain blob");
>> +            ret = -EINVAL;
>> +            goto out;
>> +        }
>> +        iscsilun->max_unmap = inq_bl->max_unmap;
>> +        iscsilun->max_unmap_bdc = inq_bl->max_unmap_bdc;
>>     }
>> 
>> #if defined(LIBISCSI_FEATURE_NOP_COUNTER)
>> --
>> 1.7.9.5
>> 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 02/11] iscsi: read unmap info from block limits vpd page
  2013-07-03 21:23     ` Peter Lieven
@ 2013-07-04 12:37       ` Paolo Bonzini
  2013-07-04 21:07         ` Peter Lieven
  0 siblings, 1 reply; 69+ messages in thread
From: Paolo Bonzini @ 2013-07-04 12:37 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, ronnie sahlberg

Il 03/07/2013 23:23, Peter Lieven ha scritto:
> BDC is not used. I had an implementation that sent multiple descriptors out, but
> at least for my storage the maximum unmap counts not for each descriptors, but for all
> together. So in this case we do not need the field at all. I forgot to remove it.
> 
> discard and write_zeroes will both only send one request up to max_unmap in size.
> 
> apropos write_zeroes: do you know if UNMAP is guaranteed to unmap data if lbprz == 1?

Yes.  On the other hand note that WRITE_SAME should be guaranteed _not_
to unmap if lbprz == 0 and you do WRITE_SAME with UNMAP and a zero
payload, but I suspect there may be buggy targets here.

> I have read in the specs something that the target might unmap the blocks or not touch them at all.
> Maybe you have more information.

That's even true of UNMAP itself, actually. :)

The storage can always "upgrade" a block from unmapped to anchored and
from anchored to allocated, so UNMAP can be a no-op and still comply
with the standard.

Paolo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 02/11] iscsi: read unmap info from block limits vpd page
  2013-07-04 12:37       ` Paolo Bonzini
@ 2013-07-04 21:07         ` Peter Lieven
  2013-07-05  6:28           ` Paolo Bonzini
  2013-07-05  7:11           ` ronnie sahlberg
  0 siblings, 2 replies; 69+ messages in thread
From: Peter Lieven @ 2013-07-04 21:07 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel, ronnie sahlberg


Am 04.07.2013 um 14:37 schrieb Paolo Bonzini <pbonzini@redhat.com>:

> Il 03/07/2013 23:23, Peter Lieven ha scritto:
>> BDC is not used. I had an implementation that sent multiple descriptors out, but
>> at least for my storage the maximum unmap counts not for each descriptors, but for all
>> together. So in this case we do not need the field at all. I forgot to remove it.
>> 
>> discard and write_zeroes will both only send one request up to max_unmap in size.
>> 
>> apropos write_zeroes: do you know if UNMAP is guaranteed to unmap data if lbprz == 1?
> 
> Yes.  On the other hand note that WRITE_SAME should be guaranteed _not_
> to unmap if lbprz == 0 and you do WRITE_SAME with UNMAP and a zero
> payload, but I suspect there may be buggy targets here.
> 
>> I have read in the specs something that the target might unmap the blocks or not touch them at all.
>> Maybe you have more information.
> 
> That's even true of UNMAP itself, actually. :)
> 
> The storage can always "upgrade" a block from unmapped to anchored and
> from anchored to allocated, so UNMAP can be a no-op and still comply
> with the standard.

My concern was, if I UNMAP a block and lbprz == 1 is it guaranteed that it reads
as zero afterwards? Regardless if the target decides to "upgrade" the block or do
not unmap the block?

Peter

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 02/11] iscsi: read unmap info from block limits vpd page
  2013-07-04 21:07         ` Peter Lieven
@ 2013-07-05  6:28           ` Paolo Bonzini
  2013-07-05  7:11           ` ronnie sahlberg
  1 sibling, 0 replies; 69+ messages in thread
From: Paolo Bonzini @ 2013-07-05  6:28 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Kevin Wolf, ronnie sahlberg, qemu-devel, Stefan Hajnoczi

Il 04/07/2013 23:07, Peter Lieven ha scritto:
> 
> Am 04.07.2013 um 14:37 schrieb Paolo Bonzini <pbonzini@redhat.com>:
> 
>> Il 03/07/2013 23:23, Peter Lieven ha scritto:
>>> BDC is not used. I had an implementation that sent multiple descriptors out, but
>>> at least for my storage the maximum unmap counts not for each descriptors, but for all
>>> together. So in this case we do not need the field at all. I forgot to remove it.
>>>
>>> discard and write_zeroes will both only send one request up to max_unmap in size.
>>>
>>> apropos write_zeroes: do you know if UNMAP is guaranteed to unmap data if lbprz == 1?
>>
>> Yes.  On the other hand note that WRITE_SAME should be guaranteed _not_
>> to unmap if lbprz == 0 and you do WRITE_SAME with UNMAP and a zero
>> payload, but I suspect there may be buggy targets here.
>>
>>> I have read in the specs something that the target might unmap the blocks or not touch them at all.
>>> Maybe you have more information.
>>
>> That's even true of UNMAP itself, actually. :)
>>
>> The storage can always "upgrade" a block from unmapped to anchored and
>> from anchored to allocated, so UNMAP can be a no-op and still comply
>> with the standard.
> 
> My concern was, if I UNMAP a block and lbprz == 1 is it guaranteed that it reads
> as zero afterwards? Regardless if the target decides to "upgrade" the block or do
> not unmap the block?

I would be very surprised, but if you are worried about that, it
definitely won't happen with WRITE_SAME.

Paolo

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 02/11] iscsi: read unmap info from block limits vpd page
  2013-07-04 21:07         ` Peter Lieven
  2013-07-05  6:28           ` Paolo Bonzini
@ 2013-07-05  7:11           ` ronnie sahlberg
  2013-07-06 22:15             ` Peter Lieven
  1 sibling, 1 reply; 69+ messages in thread
From: ronnie sahlberg @ 2013-07-05  7:11 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi

The device MIGHT map or anchor the blocks after the unmap  but it may
only do so if the blocks that become mapped are all zero.

So I think you can  safely assume that if lbprz==1 then it will always
read back as zero no matter what happens internally in the target.

Either the block becomes unmapped/deallocated and then it will read
back as zero,
or the blocks is automatically re-mapped to anchored/mapped again
but this can only happen if the mapped block is all zero so once again
it will still read back as all zero.

===

4.7.3.5 Autonomous LBA transitions
A device server may perform the following actions at any time:
a) transition any deallocated LBA to mapped;
b) transition any anchored LBA to mapped; or
c) transition any deallocated LBA to anchored.
If the LBPRZ bit in the READ CAPACITY (16) parameter data (see 5.16.2)
is set to one, and a mapped LBA
references a logical block that contains:
a) user data with all bits set to zero; and
Working Draft SCSI Block Commands – 3 (SBC-3)
27T10/BSR INCITS 514 Revision 35d
15 May 2013
b) protection information, if any, set to FFFF_FFFF_FFFF_FFFFh,
then the device server may transition that mapped LBA to anchored or
deallocated at any time.
The logical block provisioning st

On Thu, Jul 4, 2013 at 2:07 PM, Peter Lieven <pl@kamp.de> wrote:
>
> Am 04.07.2013 um 14:37 schrieb Paolo Bonzini <pbonzini@redhat.com>:
>
>> Il 03/07/2013 23:23, Peter Lieven ha scritto:
>>> BDC is not used. I had an implementation that sent multiple descriptors out, but
>>> at least for my storage the maximum unmap counts not for each descriptors, but for all
>>> together. So in this case we do not need the field at all. I forgot to remove it.
>>>
>>> discard and write_zeroes will both only send one request up to max_unmap in size.
>>>
>>> apropos write_zeroes: do you know if UNMAP is guaranteed to unmap data if lbprz == 1?
>>
>> Yes.  On the other hand note that WRITE_SAME should be guaranteed _not_
>> to unmap if lbprz == 0 and you do WRITE_SAME with UNMAP and a zero
>> payload, but I suspect there may be buggy targets here.
>>
>>> I have read in the specs something that the target might unmap the blocks or not touch them at all.
>>> Maybe you have more information.
>>
>> That's even true of UNMAP itself, actually. :)
>>
>> The storage can always "upgrade" a block from unmapped to anchored and
>> from anchored to allocated, so UNMAP can be a no-op and still comply
>> with the standard.
>
> My concern was, if I UNMAP a block and lbprz == 1 is it guaranteed that it reads
> as zero afterwards? Regardless if the target decides to "upgrade" the block or do
> not unmap the block?
>
> Peter
>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 02/11] iscsi: read unmap info from block limits vpd page
  2013-07-05  7:11           ` ronnie sahlberg
@ 2013-07-06 22:15             ` Peter Lieven
  2013-07-06 23:23               ` ronnie sahlberg
  0 siblings, 1 reply; 69+ messages in thread
From: Peter Lieven @ 2013-07-06 22:15 UTC (permalink / raw)
  To: ronnie sahlberg; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi

ok, to sum up you see no potential problem with my patch to optimize write zeroes by
unmap iff lbprz==1 and lbpme == 1 ?

the alternative would be to use writesame16 and sent a zero block. this would allow
an optimization also if lbprz == 0. in this case i would not set the unmap bit.

peter


Am 05.07.2013 um 09:11 schrieb ronnie sahlberg <ronniesahlberg@gmail.com>:

> The device MIGHT map or anchor the blocks after the unmap  but it may
> only do so if the blocks that become mapped are all zero.
> 
> So I think you can  safely assume that if lbprz==1 then it will always
> read back as zero no matter what happens internally in the target.
> 
> Either the block becomes unmapped/deallocated and then it will read
> back as zero,
> or the blocks is automatically re-mapped to anchored/mapped again
> but this can only happen if the mapped block is all zero so once again
> it will still read back as all zero.
> 
> ===
> 
> 4.7.3.5 Autonomous LBA transitions
> A device server may perform the following actions at any time:
> a) transition any deallocated LBA to mapped;
> b) transition any anchored LBA to mapped; or
> c) transition any deallocated LBA to anchored.
> If the LBPRZ bit in the READ CAPACITY (16) parameter data (see 5.16.2)
> is set to one, and a mapped LBA
> references a logical block that contains:
> a) user data with all bits set to zero; and
> Working Draft SCSI Block Commands – 3 (SBC-3)
> 27T10/BSR INCITS 514 Revision 35d
> 15 May 2013
> b) protection information, if any, set to FFFF_FFFF_FFFF_FFFFh,
> then the device server may transition that mapped LBA to anchored or
> deallocated at any time.
> The logical block provisioning st
> 
> On Thu, Jul 4, 2013 at 2:07 PM, Peter Lieven <pl@kamp.de> wrote:
>> 
>> Am 04.07.2013 um 14:37 schrieb Paolo Bonzini <pbonzini@redhat.com>:
>> 
>>> Il 03/07/2013 23:23, Peter Lieven ha scritto:
>>>> BDC is not used. I had an implementation that sent multiple descriptors out, but
>>>> at least for my storage the maximum unmap counts not for each descriptors, but for all
>>>> together. So in this case we do not need the field at all. I forgot to remove it.
>>>> 
>>>> discard and write_zeroes will both only send one request up to max_unmap in size.
>>>> 
>>>> apropos write_zeroes: do you know if UNMAP is guaranteed to unmap data if lbprz == 1?
>>> 
>>> Yes.  On the other hand note that WRITE_SAME should be guaranteed _not_
>>> to unmap if lbprz == 0 and you do WRITE_SAME with UNMAP and a zero
>>> payload, but I suspect there may be buggy targets here.
>>> 
>>>> I have read in the specs something that the target might unmap the blocks or not touch them at all.
>>>> Maybe you have more information.
>>> 
>>> That's even true of UNMAP itself, actually. :)
>>> 
>>> The storage can always "upgrade" a block from unmapped to anchored and
>>> from anchored to allocated, so UNMAP can be a no-op and still comply
>>> with the standard.
>> 
>> My concern was, if I UNMAP a block and lbprz == 1 is it guaranteed that it reads
>> as zero afterwards? Regardless if the target decides to "upgrade" the block or do
>> not unmap the block?
>> 
>> Peter
>> 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 02/11] iscsi: read unmap info from block limits vpd page
  2013-07-06 22:15             ` Peter Lieven
@ 2013-07-06 23:23               ` ronnie sahlberg
  0 siblings, 0 replies; 69+ messages in thread
From: ronnie sahlberg @ 2013-07-06 23:23 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi

Right.

I don't see any problem with your patch.


On Sat, Jul 6, 2013 at 3:15 PM, Peter Lieven <pl@kamp.de> wrote:
> ok, to sum up you see no potential problem with my patch to optimize write zeroes by
> unmap iff lbprz==1 and lbpme == 1 ?

ACK


>
> the alternative would be to use writesame16 and sent a zero block. this would allow
> an optimization also if lbprz == 0. in this case i would not set the unmap bit.
>
> peter
>
>
> Am 05.07.2013 um 09:11 schrieb ronnie sahlberg <ronniesahlberg@gmail.com>:
>
>> The device MIGHT map or anchor the blocks after the unmap  but it may
>> only do so if the blocks that become mapped are all zero.
>>
>> So I think you can  safely assume that if lbprz==1 then it will always
>> read back as zero no matter what happens internally in the target.
>>
>> Either the block becomes unmapped/deallocated and then it will read
>> back as zero,
>> or the blocks is automatically re-mapped to anchored/mapped again
>> but this can only happen if the mapped block is all zero so once again
>> it will still read back as all zero.
>>
>> ===
>>
>> 4.7.3.5 Autonomous LBA transitions
>> A device server may perform the following actions at any time:
>> a) transition any deallocated LBA to mapped;
>> b) transition any anchored LBA to mapped; or
>> c) transition any deallocated LBA to anchored.
>> If the LBPRZ bit in the READ CAPACITY (16) parameter data (see 5.16.2)
>> is set to one, and a mapped LBA
>> references a logical block that contains:
>> a) user data with all bits set to zero; and
>> Working Draft SCSI Block Commands – 3 (SBC-3)
>> 27T10/BSR INCITS 514 Revision 35d
>> 15 May 2013
>> b) protection information, if any, set to FFFF_FFFF_FFFF_FFFFh,
>> then the device server may transition that mapped LBA to anchored or
>> deallocated at any time.
>> The logical block provisioning st
>>
>> On Thu, Jul 4, 2013 at 2:07 PM, Peter Lieven <pl@kamp.de> wrote:
>>>
>>> Am 04.07.2013 um 14:37 schrieb Paolo Bonzini <pbonzini@redhat.com>:
>>>
>>>> Il 03/07/2013 23:23, Peter Lieven ha scritto:
>>>>> BDC is not used. I had an implementation that sent multiple descriptors out, but
>>>>> at least for my storage the maximum unmap counts not for each descriptors, but for all
>>>>> together. So in this case we do not need the field at all. I forgot to remove it.
>>>>>
>>>>> discard and write_zeroes will both only send one request up to max_unmap in size.
>>>>>
>>>>> apropos write_zeroes: do you know if UNMAP is guaranteed to unmap data if lbprz == 1?
>>>>
>>>> Yes.  On the other hand note that WRITE_SAME should be guaranteed _not_
>>>> to unmap if lbprz == 0 and you do WRITE_SAME with UNMAP and a zero
>>>> payload, but I suspect there may be buggy targets here.
>>>>
>>>>> I have read in the specs something that the target might unmap the blocks or not touch them at all.
>>>>> Maybe you have more information.
>>>>
>>>> That's even true of UNMAP itself, actually. :)
>>>>
>>>> The storage can always "upgrade" a block from unmapped to anchored and
>>>> from anchored to allocated, so UNMAP can be a no-op and still comply
>>>> with the standard.
>>>
>>> My concern was, if I UNMAP a block and lbprz == 1 is it guaranteed that it reads
>>> as zero afterwards? Regardless if the target decides to "upgrade" the block or do
>>> not unmap the block?
>>>
>>> Peter
>>>
>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 01/11] iscsi: add logical block provisioning information to iscsilun
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 01/11] iscsi: add logical block provisioning information to iscsilun Peter Lieven
  2013-07-01 13:35   ` Stefan Hajnoczi
@ 2013-07-10  9:19   ` Kevin Wolf
  1 sibling, 0 replies; 69+ messages in thread
From: Kevin Wolf @ 2013-07-10  9:19 UTC (permalink / raw)
  To: Peter Lieven; +Cc: pbonzini, ronniesahlberg, qemu-devel, stefanha

Am 27.06.2013 um 15:11 hat Peter Lieven geschrieben:
> Signed-off-by: Peter Lieven <pl@kamp.de>
> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block/iscsi.c |   47 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 47 insertions(+)
> 
> diff --git a/block/iscsi.c b/block/iscsi.c
> index 0bbf0b1..a38a1bf 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -49,6 +49,11 @@ typedef struct IscsiLun {
>      uint64_t num_blocks;
>      int events;
>      QEMUTimer *nop_timer;
> +    uint8_t lbpme;
> +    uint8_t lbprz;
> +    uint8_t lbpu;
> +    uint8_t lbpws;
> +    uint8_t lbpws10;
>  } IscsiLun;
>  
>  typedef struct IscsiAIOCB {
> @@ -948,6 +953,8 @@ static int iscsi_readcapacity_sync(IscsiLun *iscsilun)
>                  } else {
>                      iscsilun->block_size = rc16->block_length;
>                      iscsilun->num_blocks = rc16->returned_lba + 1;
> +                    iscsilun->lbpme = rc16->lbpme;
> +                    iscsilun->lbprz = rc16->lbprz;
>                  }
>              }
>              break;
> @@ -1130,6 +1137,46 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags)
>          bs->sg = 1;
>      }
>  
> +    if (iscsilun->lbpme) {
> +        struct scsi_inquiry_logical_block_provisioning *inq_lbp;
> +        int full_size;
> +
> +        task = iscsi_inquiry_sync(iscsi, iscsilun->lun, 1,
> +                                  SCSI_INQUIRY_PAGECODE_LOGICAL_BLOCK_PROVISIONING,
> +                                  64);
> +        if (task == NULL || task->status != SCSI_STATUS_GOOD) {
> +            error_report("iSCSI: Inquiry command failed : %s",
> +                   iscsi_get_error(iscsilun->iscsi));

The indentation is off here.

> +            ret = -EINVAL;
> +            goto out;
> +        }
> +        full_size = scsi_datain_getfullsize(task);
> +        if (full_size > task->datain.size) {
> +            scsi_free_scsi_task(task);
> +
> +            /* we need more data for the full list */
> +            task = iscsi_inquiry_sync(iscsi, iscsilun->lun, 1,
> +                                      SCSI_INQUIRY_PAGECODE_LOGICAL_BLOCK_PROVISIONING,
> +                                      full_size);
> +            if (task == NULL || task->status != SCSI_STATUS_GOOD) {
> +                error_report("iSCSI: Inquiry command failed : %s",
> +                             iscsi_get_error(iscsilun->iscsi));
> +                ret = -EINVAL;
> +                goto out;
> +            }
> +        }
> +
> +        inq_lbp = scsi_datain_unmarshall(task);
> +        if (inq_lbp == NULL) {
> +            error_report("iSCSI: failed to unmarshall inquiry datain blob");
> +            ret = -EINVAL;
> +            goto out;
> +        }
> +        iscsilun->lbpu = inq_lbp->lbpu;
> +        iscsilun->lbpws = inq_lbp->lbpws;
> +        iscsilun->lbpws10 = inq_lbp->lbpws10;
> +    }

As mentioned in Stefan's mail, scsi_free_scsi_task() should be added in
this patch rather than patch 2.

Kevin

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 02/11] iscsi: read unmap info from block limits vpd page
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 02/11] iscsi: read unmap info from block limits vpd page Peter Lieven
  2013-07-03  3:43   ` ronnie sahlberg
@ 2013-07-10  9:23   ` Kevin Wolf
  2013-07-10  9:25   ` Kevin Wolf
  2 siblings, 0 replies; 69+ messages in thread
From: Kevin Wolf @ 2013-07-10  9:23 UTC (permalink / raw)
  To: Peter Lieven; +Cc: pbonzini, ronniesahlberg, qemu-devel, stefanha

Am 27.06.2013 um 15:11 hat Peter Lieven geschrieben:
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  block/iscsi.c |   80 ++++++++++++++++++++++++++++++++++++++++-----------------
>  1 file changed, 56 insertions(+), 24 deletions(-)

> @@ -1175,6 +1187,26 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags)
>          iscsilun->lbpu = inq_lbp->lbpu;
>          iscsilun->lbpws = inq_lbp->lbpws;
>          iscsilun->lbpws10 = inq_lbp->lbpws10;
> +        scsi_free_scsi_task(task);
> +        task = NULL;
> +    }
> +
> +    if (iscsilun->lbpu) {
> +        struct scsi_inquiry_block_limits *inq_bl;
> +        task = iscsi_do_inquiry(iscsilun->iscsi, iscsilun->lun, 1,
> +                                SCSI_INQUIRY_PAGECODE_BLOCK_LIMITS);
> +        if (task == NULL) {
> +            ret = -EINVAL;
> +            goto out;
> +        }
> +        inq_bl = scsi_datain_unmarshall(task);
> +        if (inq_bl == NULL) {
> +            error_report("iSCSI: failed to unmarshall inquiry datain blob");
> +            ret = -EINVAL;
> +            goto out;
> +        }
> +        iscsilun->max_unmap = inq_bl->max_unmap;
> +        iscsilun->max_unmap_bdc = inq_bl->max_unmap_bdc;
>      }

How about scsi_free_scsi_task() here as well? It's caught at the end of
the function, but I think it's nicer to free it in the block that uses
it locally.

Kevin

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 02/11] iscsi: read unmap info from block limits vpd page
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 02/11] iscsi: read unmap info from block limits vpd page Peter Lieven
  2013-07-03  3:43   ` ronnie sahlberg
  2013-07-10  9:23   ` Kevin Wolf
@ 2013-07-10  9:25   ` Kevin Wolf
  2 siblings, 0 replies; 69+ messages in thread
From: Kevin Wolf @ 2013-07-10  9:25 UTC (permalink / raw)
  To: Peter Lieven; +Cc: pbonzini, ronniesahlberg, qemu-devel, stefanha

Am 27.06.2013 um 15:11 hat Peter Lieven geschrieben:
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  block/iscsi.c |   80 ++++++++++++++++++++++++++++++++++++++++-----------------
>  1 file changed, 56 insertions(+), 24 deletions(-)
> 
> diff --git a/block/iscsi.c b/block/iscsi.c
> index a38a1bf..2e2455d 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -54,6 +54,8 @@ typedef struct IscsiLun {
>      uint8_t lbpu;
>      uint8_t lbpws;
>      uint8_t lbpws10;
> +    uint32_t max_unmap;
> +    uint32_t max_unmap_bdc;
>  } IscsiLun;
>  
>  typedef struct IscsiAIOCB {
> @@ -1007,6 +1009,37 @@ static QemuOptsList runtime_opts = {
>      },
>  };
>  
> +static struct scsi_task *iscsi_do_inquiry(struct iscsi_context *iscsi,
> +                                          int lun, int evpd, int pc) {

Oops, forgot to add the comment in the other mail... This isn't valid
coding style, the brace belongs on a line of its own.

Kevin

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 03/11] iscsi: add bdrv_co_is_allocated
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 03/11] iscsi: add bdrv_co_is_allocated Peter Lieven
  2013-07-01 13:46   ` Stefan Hajnoczi
@ 2013-07-10  9:41   ` Kevin Wolf
  2013-07-10 13:49     ` Peter Lieven
  1 sibling, 1 reply; 69+ messages in thread
From: Kevin Wolf @ 2013-07-10  9:41 UTC (permalink / raw)
  To: Peter Lieven; +Cc: pbonzini, ronniesahlberg, qemu-devel, stefanha

Am 27.06.2013 um 15:11 hat Peter Lieven geschrieben:
> this patch adds a coroutine for .bdrv_co_is_allocated as well as
> a generic framework that can be used to build coroutines in block/iscsi.
> 
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  block/iscsi.c |  123 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 123 insertions(+)
> 
> diff --git a/block/iscsi.c b/block/iscsi.c
> index 2e2455d..d31ee95 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -58,6 +58,15 @@ typedef struct IscsiLun {
>      uint32_t max_unmap_bdc;
>  } IscsiLun;
>  
> +typedef struct IscsiTask {
> +    int status;
> +    int complete;
> +    int retries;
> +    int do_retry;
> +    struct scsi_task *task;
> +    Coroutine *co;
> +} IscsiTask;
> +
>  typedef struct IscsiAIOCB {
>      BlockDriverAIOCB common;
>      QEMUIOVector *qiov;
> @@ -113,6 +122,45 @@ iscsi_schedule_bh(IscsiAIOCB *acb)
>      qemu_bh_schedule(acb->bh);
>  }
>  
> +static void
> +iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
> +                        void *command_data, void *opaque)
> +{
> +    struct IscsiTask *iTask = opaque;
> +    struct scsi_task *task = command_data;
> +
> +    iTask->complete = 1;
> +    iTask->status = status;
> +    iTask->do_retry = 0;
> +    iTask->task = task;
> +
> +    if (iTask->retries-- > 0 && status == SCSI_STATUS_CHECK_CONDITION
> +        && task->sense.key == SCSI_SENSE_UNIT_ATTENTION) {
> +        iTask->do_retry = 1;
> +        goto out;
> +    }
> +
> +    if (status != SCSI_STATUS_GOOD) {
> +        error_report("iSCSI: Failure. %s", iscsi_get_error(iscsi));
> +        goto out;
> +    }
> +
> +out:
> +    if (iTask->status != SCSI_STATUS_GOOD) {
> +        scsi_free_scsi_task(iTask->task);
> +        iTask->task = NULL;
> +    }
> +    if (iTask->co) {
> +        qemu_coroutine_enter(iTask->co, NULL);
> +    }
> +}

In which context does this callback run? If this is from a separate
thread, you may not do all of this (without even holding the QBL). You
should probably use a BH to switch to the I/O thread.

> +
> +static void iscsi_co_init_iscsitask(IscsiLun *iscsilun, struct IscsiTask *iTask)
> +{
> +    memset(iTask, 0x00, sizeof(struct IscsiTask));
> +    iTask->co = qemu_coroutine_self();
> +    iTask->retries = ISCSI_CMD_RETRIES;
> +}

Matter of taste, but in my own code I prefer not to have a memset:

*iTask = (struct IscsiTask) {
    .co         = qemu_coroutine_self(),
    .retries    = ISCSI_CMD_RETRIES,
};

>  static void
>  iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void *command_data,
> @@ -807,6 +855,79 @@ iscsi_getlength(BlockDriverState *bs)
>      return len;
>  }
>  
> +static int coroutine_fn iscsi_co_is_allocated(BlockDriverState *bs,
> +                                              int64_t sector_num,
> +                                              int nb_sectors, int *pnum)
> +{
> +    IscsiLun *iscsilun = bs->opaque;
> +    struct scsi_get_lba_status *lbas = NULL;
> +    struct scsi_lba_status_descriptor *lbasd = NULL;
> +    struct IscsiTask iTask;
> +    int ret;
> +
> +    /* LUN does not support logical block provisioning */
> +    if (iscsilun->lbpme == 0) {
> +        *pnum = nb_sectors;
> +        return 1;
> +    }
> +
> +    iscsi_co_init_iscsitask(iscsilun, &iTask);
> +
> +retry:
> +    if (iscsi_get_lba_status_task(iscsilun->iscsi, iscsilun->lun,
> +                                  sector_qemu2lun(sector_num, iscsilun),
> +                                  8 + 16, iscsi_co_generic_cb,
> +                                  &iTask) == NULL) {
> +        goto fail;
> +    }
> +
> +    while (!iTask.complete) {
> +        iscsi_set_events(iscsilun);
> +        qemu_coroutine_yield();
> +    }
> +
> +    if (iTask.do_retry) {
> +        goto retry;
> +    }
> +
> +    if (iTask.status != SCSI_STATUS_GOOD) {
> +        /* in case the get_lba_status_callout fails (i.e.
> +         * because the device is busy or the cmd is not
> +         * supported) we pretend all blocks are allocated
> +         * for backwards compatiblity */
> +        *pnum = nb_sectors;
> +        return 1;
> +    }
> +
> +    lbas = scsi_datain_unmarshall(iTask.task);
> +    if (lbas == NULL) {
> +        goto fail;
> +    }
> +
> +    lbasd = &lbas->descriptors[0];
> +
> +    if (sector_qemu2lun(sector_num, iscsilun) != lbasd->lba) {
> +        goto fail;
> +    }
> +
> +    *pnum = lbasd->num_blocks * (iscsilun->block_size / BDRV_SECTOR_SIZE);
> +    if (*pnum > nb_sectors) {
> +        *pnum = nb_sectors;
> +    }
> +
> +    ret = lbasd->provisioning == SCSI_PROVISIONING_TYPE_MAPPED ? 1 : 0;

Wouldn't it be safer to default to allocated, in case new values are
added in a later protocol version? I agree with returning 0 for both
unmapped and anchored, though.

> +    scsi_free_scsi_task(iTask.task);
> +    return ret;
> +
> +fail:
> +    if (iTask.task != NULL) {
> +        scsi_free_scsi_task(iTask.task);
> +    }
> +    *pnum = 0;
> +    return 0;
> +}

Kevin

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 04/11] iscsi: add bdrv_co_write_zeroes
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 04/11] iscsi: add bdrv_co_write_zeroes Peter Lieven
@ 2013-07-10  9:54   ` Kevin Wolf
  0 siblings, 0 replies; 69+ messages in thread
From: Kevin Wolf @ 2013-07-10  9:54 UTC (permalink / raw)
  To: Peter Lieven; +Cc: pbonzini, ronniesahlberg, qemu-devel, stefanha

Am 27.06.2013 um 15:11 hat Peter Lieven geschrieben:
> write zeroes is emulated by unmap if the target supports unmapping
> an unmapped blocks read as zero. this emulation is only done if the
> device was opened with BDRV_O_UNMAP and the request can be handled
> within a single request. a failback to writev is performed otherwise.
> 
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  block/iscsi.c |   44 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 44 insertions(+)
> 
> diff --git a/block/iscsi.c b/block/iscsi.c
> index d31ee95..92e66a6 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -928,6 +928,49 @@ fail:
>      return 0;
>  }
>  
> +static int
> +coroutine_fn iscsi_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
> +                                   int nb_sectors)
> +{
> +    IscsiLun *iscsilun = bs->opaque;
> +    struct IscsiTask iTask;
> +    struct unmap_list list[1];

Why declare an array when it's only one element anyway?

> +    if (!iscsilun->lbprz || !iscsilun->lbpu || !(bs->open_flags & BDRV_O_UNMAP) ||
> +        nb_sectors * BDRV_SECTOR_SIZE / iscsilun->block_size > iscsilun->max_unmap) {

The first part of this line is sector_qemu2lun().

> +        /* fall back to writev */
> +        return -ENOTSUP;
> +    }
> +
> +    iscsi_co_init_iscsitask(iscsilun, &iTask);
> +
> +    list[0].lba = sector_qemu2lun(sector_num, iscsilun);
> +    list[0].num = nb_sectors * BDRV_SECTOR_SIZE / iscsilun->block_size;

Same here.

Kevin

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 05/11] block: add bdrv_write_zeroes()
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 05/11] block: add bdrv_write_zeroes() Peter Lieven
@ 2013-07-10  9:56   ` Kevin Wolf
  0 siblings, 0 replies; 69+ messages in thread
From: Kevin Wolf @ 2013-07-10  9:56 UTC (permalink / raw)
  To: Peter Lieven; +Cc: pbonzini, ronniesahlberg, qemu-devel, stefanha

Am 27.06.2013 um 15:11 hat Peter Lieven geschrieben:
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  block.c               |   27 +++++++++++++++++++--------
>  include/block/block.h |    2 ++
>  2 files changed, 21 insertions(+), 8 deletions(-)

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 06/11] block/raw: add bdrv_co_write_zeroes
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 06/11] block/raw: add bdrv_co_write_zeroes Peter Lieven
@ 2013-07-10  9:57   ` Kevin Wolf
  0 siblings, 0 replies; 69+ messages in thread
From: Kevin Wolf @ 2013-07-10  9:57 UTC (permalink / raw)
  To: Peter Lieven; +Cc: pbonzini, ronniesahlberg, qemu-devel, stefanha

Am 27.06.2013 um 15:11 hat Peter Lieven geschrieben:
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  block/raw.c |    8 ++++++++
>  1 file changed, 8 insertions(+)

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 07/11] iscsi: let bdrv_create conditionally zero out the device
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 07/11] iscsi: let bdrv_create conditionally zero out the device Peter Lieven
  2013-07-01 13:58   ` Stefan Hajnoczi
  2013-07-01 20:20   ` Paolo Bonzini
@ 2013-07-10 10:14   ` Kevin Wolf
  2013-07-10 13:52     ` Peter Lieven
  2 siblings, 1 reply; 69+ messages in thread
From: Kevin Wolf @ 2013-07-10 10:14 UTC (permalink / raw)
  To: Peter Lieven; +Cc: pbonzini, ronniesahlberg, qemu-devel, stefanha

Am 27.06.2013 um 15:11 hat Peter Lieven geschrieben:
> if the device supports unmapping and unmapped blocks read as
> zero ensure that the whole device is unmapped and report
> .has_zero_init = 1 in this case to speed up qemu-img convert.
> 
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  block/iscsi.c |   72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 71 insertions(+), 1 deletion(-)
> 
> diff --git a/block/iscsi.c b/block/iscsi.c
> index 92e66a6..621ca40 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -1436,7 +1436,8 @@ static int iscsi_truncate(BlockDriverState *bs, int64_t offset)
>  
>  static int iscsi_has_zero_init(BlockDriverState *bs)
>  {
> -    return 0;
> +    IscsiLun *iscsilun = bs->opaque;
> +    return (iscsilun->lbpu && iscsilun->lbprz) ? 1 : 0;
>  }
>  
>  static int iscsi_create(const char *filename, QEMUOptionParameter *options)
> @@ -1446,6 +1447,7 @@ static int iscsi_create(const char *filename, QEMUOptionParameter *options)
>      BlockDriverState bs;
>      IscsiLun *iscsilun = NULL;
>      QDict *bs_options;
> +    struct scsi_task *task = NULL;
>  
>      memset(&bs, 0, sizeof(BlockDriverState));
>  
> @@ -1481,7 +1483,75 @@ static int iscsi_create(const char *filename, QEMUOptionParameter *options)
>      }
>  
>      ret = 0;

Noticed something independent from your patch, a bit more context:

    if (bs.total_sectors < total_size) {
        ret = -ENOSPC;
    }

    ret = 0;

The -ENOSPC case can't ever trigger, there's a 'goto out' missing. Maybe
you can include another patch too fix this in the next version of your
series?

Kevin

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 09/11] iscsi: factor out sector conversions
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 09/11] iscsi: factor out sector conversions Peter Lieven
@ 2013-07-10 11:29   ` Kevin Wolf
  2013-07-10 14:07     ` Peter Lieven
  0 siblings, 1 reply; 69+ messages in thread
From: Kevin Wolf @ 2013-07-10 11:29 UTC (permalink / raw)
  To: Peter Lieven; +Cc: pbonzini, ronniesahlberg, qemu-devel, stefanha

Am 27.06.2013 um 15:11 hat Peter Lieven geschrieben:
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  block/iscsi.c |   18 +++++++++++-------
>  1 file changed, 11 insertions(+), 7 deletions(-)

A bit confusing that you add only one function (because the other one is
already there), but convert both directions. Looks correct, though.

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 10/11] iscsi: ignore aio_discard if unsupported
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 10/11] iscsi: ignore aio_discard if unsupported Peter Lieven
@ 2013-07-10 11:33   ` Kevin Wolf
  2013-07-10 14:04     ` Peter Lieven
  0 siblings, 1 reply; 69+ messages in thread
From: Kevin Wolf @ 2013-07-10 11:33 UTC (permalink / raw)
  To: Peter Lieven; +Cc: pbonzini, ronniesahlberg, qemu-devel, stefanha

Am 27.06.2013 um 15:11 hat Peter Lieven geschrieben:
> if the target does not support UNMAP or the request
> is too big silently ignore the discard request.
> 
> Signed-off-by: Peter Lieven <pl@kamp.de>

Why not loop for the "too big" case? You can probably use the same logic
for unmapping the whole device in .bdrv_create and here.

Kevin

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 11/11] iscsi: assert that sectors are aligned to LUN blocksize
  2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 11/11] iscsi: assert that sectors are aligned to LUN blocksize Peter Lieven
  2013-07-01 14:35   ` Stefan Hajnoczi
@ 2013-07-10 11:38   ` Kevin Wolf
  2013-07-10 14:02     ` Peter Lieven
  1 sibling, 1 reply; 69+ messages in thread
From: Kevin Wolf @ 2013-07-10 11:38 UTC (permalink / raw)
  To: Peter Lieven; +Cc: pbonzini, ronniesahlberg, qemu-devel, stefanha

Am 27.06.2013 um 15:11 hat Peter Lieven geschrieben:
> if the blocksize of an iSCSI LUN is bigger than the BDRV_SECTOR_SIZE
> it is possible that sector_num or nb_sectors are not correctly
> alligned.
> 
> to avoid corruption we fail requests which are misaligned.
> 
> Signed-off-by: Peter Lieven <pl@kamp.de>
> ---
>  block/iscsi.c |   27 +++++++++++++++++++++++++++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/block/iscsi.c b/block/iscsi.c
> index 0567b46..bff2e1f 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -298,6 +298,13 @@ static int64_t sector_lun2qemu(int64_t sector, IscsiLun *iscsilun)
>      return sector * iscsilun->block_size / BDRV_SECTOR_SIZE;
>  }
>  
> +static int64_t is_request_lun_aligned(int64_t sector_num, int nb_sectors,
> +                                      IscsiLun *iscsilun)

This should certainly return bool instead of int64_t?

> +{
> +    return ((sector_num * BDRV_SECTOR_SIZE) % iscsilun->block_size ||
> +            (nb_sectors * BDRV_SECTOR_SIZE) % iscsilun->block_size) ? 0 : 1;

'x ? 0 : 1' is usually written '!x'.

Kevin

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 03/11] iscsi: add bdrv_co_is_allocated
  2013-07-10  9:41   ` Kevin Wolf
@ 2013-07-10 13:49     ` Peter Lieven
  2013-07-10 14:43       ` Kevin Wolf
  0 siblings, 1 reply; 69+ messages in thread
From: Peter Lieven @ 2013-07-10 13:49 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: pbonzini, ronniesahlberg, qemu-devel, stefanha

Am 10.07.2013 11:41, schrieb Kevin Wolf:
> Am 27.06.2013 um 15:11 hat Peter Lieven geschrieben:
>> this patch adds a coroutine for .bdrv_co_is_allocated as well as
>> a generic framework that can be used to build coroutines in block/iscsi.
>>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>>  block/iscsi.c |  123 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 123 insertions(+)
>>
>> diff --git a/block/iscsi.c b/block/iscsi.c
>> index 2e2455d..d31ee95 100644
>> --- a/block/iscsi.c
>> +++ b/block/iscsi.c
>> @@ -58,6 +58,15 @@ typedef struct IscsiLun {
>>      uint32_t max_unmap_bdc;
>>  } IscsiLun;
>>  
>> +typedef struct IscsiTask {
>> +    int status;
>> +    int complete;
>> +    int retries;
>> +    int do_retry;
>> +    struct scsi_task *task;
>> +    Coroutine *co;
>> +} IscsiTask;
>> +
>>  typedef struct IscsiAIOCB {
>>      BlockDriverAIOCB common;
>>      QEMUIOVector *qiov;
>> @@ -113,6 +122,45 @@ iscsi_schedule_bh(IscsiAIOCB *acb)
>>      qemu_bh_schedule(acb->bh);
>>  }
>>  
>> +static void
>> +iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
>> +                        void *command_data, void *opaque)
>> +{
>> +    struct IscsiTask *iTask = opaque;
>> +    struct scsi_task *task = command_data;
>> +
>> +    iTask->complete = 1;
>> +    iTask->status = status;
>> +    iTask->do_retry = 0;
>> +    iTask->task = task;
>> +
>> +    if (iTask->retries-- > 0 && status == SCSI_STATUS_CHECK_CONDITION
>> +        && task->sense.key == SCSI_SENSE_UNIT_ATTENTION) {
>> +        iTask->do_retry = 1;
>> +        goto out;
>> +    }
>> +
>> +    if (status != SCSI_STATUS_GOOD) {
>> +        error_report("iSCSI: Failure. %s", iscsi_get_error(iscsi));
>> +        goto out;
>> +    }
>> +
>> +out:
>> +    if (iTask->status != SCSI_STATUS_GOOD) {
>> +        scsi_free_scsi_task(iTask->task);
>> +        iTask->task = NULL;
>> +    }
>> +    if (iTask->co) {
>> +        qemu_coroutine_enter(iTask->co, NULL);
>> +    }
>> +}
> In which context does this callback run? If this is from a separate
> thread, you may not do all of this (without even holding the QBL). You
> should probably use a BH to switch to the I/O thread.
good question. the callbacks can only be fired when iscsi_service() is invoked.
this is currently done by registereing iscsi_process_read/write/flush via qemu_aio_set_fd_handler().

so the callbacks are invoked in the aio context it seems. can this be right?
>> +
>> +static void iscsi_co_init_iscsitask(IscsiLun *iscsilun, struct IscsiTask *iTask)
>> +{
>> +    memset(iTask, 0x00, sizeof(struct IscsiTask));
>> +    iTask->co = qemu_coroutine_self();
>> +    iTask->retries = ISCSI_CMD_RETRIES;
>> +}
> Matter of taste, but in my own code I prefer not to have a memset:
>
> *iTask = (struct IscsiTask) {
>     .co         = qemu_coroutine_self(),
>     .retries    = ISCSI_CMD_RETRIES,
> };
ok
>
>>  static void
>>  iscsi_abort_task_cb(struct iscsi_context *iscsi, int status, void *command_data,
>> @@ -807,6 +855,79 @@ iscsi_getlength(BlockDriverState *bs)
>>      return len;
>>  }
>>  
>> +static int coroutine_fn iscsi_co_is_allocated(BlockDriverState *bs,
>> +                                              int64_t sector_num,
>> +                                              int nb_sectors, int *pnum)
>> +{
>> +    IscsiLun *iscsilun = bs->opaque;
>> +    struct scsi_get_lba_status *lbas = NULL;
>> +    struct scsi_lba_status_descriptor *lbasd = NULL;
>> +    struct IscsiTask iTask;
>> +    int ret;
>> +
>> +    /* LUN does not support logical block provisioning */
>> +    if (iscsilun->lbpme == 0) {
>> +        *pnum = nb_sectors;
>> +        return 1;
>> +    }
>> +
>> +    iscsi_co_init_iscsitask(iscsilun, &iTask);
>> +
>> +retry:
>> +    if (iscsi_get_lba_status_task(iscsilun->iscsi, iscsilun->lun,
>> +                                  sector_qemu2lun(sector_num, iscsilun),
>> +                                  8 + 16, iscsi_co_generic_cb,
>> +                                  &iTask) == NULL) {
>> +        goto fail;
>> +    }
>> +
>> +    while (!iTask.complete) {
>> +        iscsi_set_events(iscsilun);
>> +        qemu_coroutine_yield();
>> +    }
>> +
>> +    if (iTask.do_retry) {
>> +        goto retry;
>> +    }
>> +
>> +    if (iTask.status != SCSI_STATUS_GOOD) {
>> +        /* in case the get_lba_status_callout fails (i.e.
>> +         * because the device is busy or the cmd is not
>> +         * supported) we pretend all blocks are allocated
>> +         * for backwards compatiblity */
>> +        *pnum = nb_sectors;
>> +        return 1;
>> +    }
>> +
>> +    lbas = scsi_datain_unmarshall(iTask.task);
>> +    if (lbas == NULL) {
>> +        goto fail;
>> +    }
>> +
>> +    lbasd = &lbas->descriptors[0];
>> +
>> +    if (sector_qemu2lun(sector_num, iscsilun) != lbasd->lba) {
>> +        goto fail;
>> +    }
>> +
>> +    *pnum = lbasd->num_blocks * (iscsilun->block_size / BDRV_SECTOR_SIZE);
>> +    if (*pnum > nb_sectors) {
>> +        *pnum = nb_sectors;
>> +    }
>> +
>> +    ret = lbasd->provisioning == SCSI_PROVISIONING_TYPE_MAPPED ? 1 : 0;
> Wouldn't it be safer to default to allocated, in case new values are
> added in a later protocol version? I agree with returning 0 for both
> unmapped and anchored, though.
agreed.
>
>> +    scsi_free_scsi_task(iTask.task);
>> +    return ret;
>> +
>> +fail:
>> +    if (iTask.task != NULL) {
>> +        scsi_free_scsi_task(iTask.task);
>> +    }
>> +    *pnum = 0;
>> +    return 0;
>> +}
> Kevin
Peter

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 07/11] iscsi: let bdrv_create conditionally zero out the device
  2013-07-10 10:14   ` Kevin Wolf
@ 2013-07-10 13:52     ` Peter Lieven
  0 siblings, 0 replies; 69+ messages in thread
From: Peter Lieven @ 2013-07-10 13:52 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: pbonzini, ronniesahlberg, qemu-devel, stefanha

Am 10.07.2013 12:14, schrieb Kevin Wolf:
> Am 27.06.2013 um 15:11 hat Peter Lieven geschrieben:
>> if the device supports unmapping and unmapped blocks read as
>> zero ensure that the whole device is unmapped and report
>> .has_zero_init = 1 in this case to speed up qemu-img convert.
>>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>>  block/iscsi.c |   72 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 71 insertions(+), 1 deletion(-)
>>
>> diff --git a/block/iscsi.c b/block/iscsi.c
>> index 92e66a6..621ca40 100644
>> --- a/block/iscsi.c
>> +++ b/block/iscsi.c
>> @@ -1436,7 +1436,8 @@ static int iscsi_truncate(BlockDriverState *bs, int64_t offset)
>>  
>>  static int iscsi_has_zero_init(BlockDriverState *bs)
>>  {
>> -    return 0;
>> +    IscsiLun *iscsilun = bs->opaque;
>> +    return (iscsilun->lbpu && iscsilun->lbprz) ? 1 : 0;
>>  }
>>  
>>  static int iscsi_create(const char *filename, QEMUOptionParameter *options)
>> @@ -1446,6 +1447,7 @@ static int iscsi_create(const char *filename, QEMUOptionParameter *options)
>>      BlockDriverState bs;
>>      IscsiLun *iscsilun = NULL;
>>      QDict *bs_options;
>> +    struct scsi_task *task = NULL;
>>  
>>      memset(&bs, 0, sizeof(BlockDriverState));
>>  
>> @@ -1481,7 +1483,75 @@ static int iscsi_create(const char *filename, QEMUOptionParameter *options)
>>      }
>>  
>>      ret = 0;
> Noticed something independent from your patch, a bit more context:
>
>     if (bs.total_sectors < total_size) {
>         ret = -ENOSPC;
>     }
>
>     ret = 0;
>
> The -ENOSPC case can't ever trigger, there's a 'goto out' missing. Maybe
> you can include another patch too fix this in the next version of your
> series?
will do.

the patch 7/11 itself will not be included. i was asked to make a general
approach to zero out the device in qemu-img if a device has discard_zeroes.

Peter

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 11/11] iscsi: assert that sectors are aligned to LUN blocksize
  2013-07-10 11:38   ` Kevin Wolf
@ 2013-07-10 14:02     ` Peter Lieven
  0 siblings, 0 replies; 69+ messages in thread
From: Peter Lieven @ 2013-07-10 14:02 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: pbonzini, ronniesahlberg, qemu-devel, stefanha

Am 10.07.2013 13:38, schrieb Kevin Wolf:
> Am 27.06.2013 um 15:11 hat Peter Lieven geschrieben:
>> if the blocksize of an iSCSI LUN is bigger than the BDRV_SECTOR_SIZE
>> it is possible that sector_num or nb_sectors are not correctly
>> alligned.
>>
>> to avoid corruption we fail requests which are misaligned.
>>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>>  block/iscsi.c |   27 +++++++++++++++++++++++++++
>>  1 file changed, 27 insertions(+)
>>
>> diff --git a/block/iscsi.c b/block/iscsi.c
>> index 0567b46..bff2e1f 100644
>> --- a/block/iscsi.c
>> +++ b/block/iscsi.c
>> @@ -298,6 +298,13 @@ static int64_t sector_lun2qemu(int64_t sector, IscsiLun *iscsilun)
>>      return sector * iscsilun->block_size / BDRV_SECTOR_SIZE;
>>  }
>>  
>> +static int64_t is_request_lun_aligned(int64_t sector_num, int nb_sectors,
>> +                                      IscsiLun *iscsilun)
> This should certainly return bool instead of int64_t?
yes
>
>> +{
>> +    return ((sector_num * BDRV_SECTOR_SIZE) % iscsilun->block_size ||
>> +            (nb_sectors * BDRV_SECTOR_SIZE) % iscsilun->block_size) ? 0 : 1;
> 'x ? 0 : 1' is usually written '!x'.
ok
>
> Kevin

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 10/11] iscsi: ignore aio_discard if unsupported
  2013-07-10 11:33   ` Kevin Wolf
@ 2013-07-10 14:04     ` Peter Lieven
  2013-07-10 14:28       ` Kevin Wolf
  0 siblings, 1 reply; 69+ messages in thread
From: Peter Lieven @ 2013-07-10 14:04 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: pbonzini, ronniesahlberg, qemu-devel, stefanha

Am 10.07.2013 13:33, schrieb Kevin Wolf:
> Am 27.06.2013 um 15:11 hat Peter Lieven geschrieben:
>> if the target does not support UNMAP or the request
>> is too big silently ignore the discard request.
>>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
> Why not loop for the "too big" case? You can probably use the same logic
> for unmapping the whole device in .bdrv_create and here.
right, but looping in an aio function seemed not so trivial to me.
it seems more and more obvious to me that the best would be to change
all the remaining aio routines to co routines.

in this case i could add the too big logic in iscsi_co_discard and simply call
it from iscsi_co_write_zeroes.

Peter
>
> Kevin

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 09/11] iscsi: factor out sector conversions
  2013-07-10 11:29   ` Kevin Wolf
@ 2013-07-10 14:07     ` Peter Lieven
  0 siblings, 0 replies; 69+ messages in thread
From: Peter Lieven @ 2013-07-10 14:07 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: pbonzini, ronniesahlberg, qemu-devel, stefanha

Am 10.07.2013 13:29, schrieb Kevin Wolf:
> Am 27.06.2013 um 15:11 hat Peter Lieven geschrieben:
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>>  block/iscsi.c |   18 +++++++++++-------
>>  1 file changed, 11 insertions(+), 7 deletions(-)
> A bit confusing that you add only one function (because the other one is
> already there), but convert both directions. Looks correct, though.
this is because qemu_sector2lun was only used for the lba, but never
for nb_sectors by Ronnie.

Peer
>
> Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 10/11] iscsi: ignore aio_discard if unsupported
  2013-07-10 14:04     ` Peter Lieven
@ 2013-07-10 14:28       ` Kevin Wolf
  2013-07-10 14:49         ` Peter Lieven
  0 siblings, 1 reply; 69+ messages in thread
From: Kevin Wolf @ 2013-07-10 14:28 UTC (permalink / raw)
  To: Peter Lieven; +Cc: pbonzini, ronniesahlberg, qemu-devel, stefanha

Am 10.07.2013 um 16:04 hat Peter Lieven geschrieben:
> Am 10.07.2013 13:33, schrieb Kevin Wolf:
> > Am 27.06.2013 um 15:11 hat Peter Lieven geschrieben:
> >> if the target does not support UNMAP or the request
> >> is too big silently ignore the discard request.
> >>
> >> Signed-off-by: Peter Lieven <pl@kamp.de>
> > Why not loop for the "too big" case? You can probably use the same logic
> > for unmapping the whole device in .bdrv_create and here.
> right, but looping in an aio function seemed not so trivial to me.
> it seems more and more obvious to me that the best would be to change
> all the remaining aio routines to co routines.

The pattern for AIO functions is that the real work of submitting
requests is done in the AIO callback, and it submits new AIO requests
calling back into the same callback as long as acb->remaining_secs > 0
(or something like that).

You can still see that kind of thing alive in qed_aio_next_io(), (most
of?) the rest is converted to coroutines because it makes the code look
nicer.

> in this case i could add the too big logic in iscsi_co_discard and simply call
> it from iscsi_co_write_zeroes.

I think that would be the nicest solution.

Kevin

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 03/11] iscsi: add bdrv_co_is_allocated
  2013-07-10 13:49     ` Peter Lieven
@ 2013-07-10 14:43       ` Kevin Wolf
  2013-07-10 14:49         ` Peter Lieven
  0 siblings, 1 reply; 69+ messages in thread
From: Kevin Wolf @ 2013-07-10 14:43 UTC (permalink / raw)
  To: Peter Lieven; +Cc: pbonzini, ronniesahlberg, qemu-devel, stefanha

Am 10.07.2013 um 15:49 hat Peter Lieven geschrieben:
> Am 10.07.2013 11:41, schrieb Kevin Wolf:
> > Am 27.06.2013 um 15:11 hat Peter Lieven geschrieben:
> >> this patch adds a coroutine for .bdrv_co_is_allocated as well as
> >> a generic framework that can be used to build coroutines in block/iscsi.
> >>
> >> Signed-off-by: Peter Lieven <pl@kamp.de>
> >> ---
> >>  block/iscsi.c |  123 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 123 insertions(+)
> >>
> >> diff --git a/block/iscsi.c b/block/iscsi.c
> >> index 2e2455d..d31ee95 100644
> >> --- a/block/iscsi.c
> >> +++ b/block/iscsi.c
> >> @@ -58,6 +58,15 @@ typedef struct IscsiLun {
> >>      uint32_t max_unmap_bdc;
> >>  } IscsiLun;
> >>  
> >> +typedef struct IscsiTask {
> >> +    int status;
> >> +    int complete;
> >> +    int retries;
> >> +    int do_retry;
> >> +    struct scsi_task *task;
> >> +    Coroutine *co;
> >> +} IscsiTask;
> >> +
> >>  typedef struct IscsiAIOCB {
> >>      BlockDriverAIOCB common;
> >>      QEMUIOVector *qiov;
> >> @@ -113,6 +122,45 @@ iscsi_schedule_bh(IscsiAIOCB *acb)
> >>      qemu_bh_schedule(acb->bh);
> >>  }
> >>  
> >> +static void
> >> +iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
> >> +                        void *command_data, void *opaque)
> >> +{
> >> +    struct IscsiTask *iTask = opaque;
> >> +    struct scsi_task *task = command_data;
> >> +
> >> +    iTask->complete = 1;
> >> +    iTask->status = status;
> >> +    iTask->do_retry = 0;
> >> +    iTask->task = task;
> >> +
> >> +    if (iTask->retries-- > 0 && status == SCSI_STATUS_CHECK_CONDITION
> >> +        && task->sense.key == SCSI_SENSE_UNIT_ATTENTION) {
> >> +        iTask->do_retry = 1;
> >> +        goto out;
> >> +    }
> >> +
> >> +    if (status != SCSI_STATUS_GOOD) {
> >> +        error_report("iSCSI: Failure. %s", iscsi_get_error(iscsi));
> >> +        goto out;
> >> +    }
> >> +
> >> +out:
> >> +    if (iTask->status != SCSI_STATUS_GOOD) {
> >> +        scsi_free_scsi_task(iTask->task);
> >> +        iTask->task = NULL;
> >> +    }
> >> +    if (iTask->co) {
> >> +        qemu_coroutine_enter(iTask->co, NULL);
> >> +    }
> >> +}
> > In which context does this callback run? If this is from a separate
> > thread, you may not do all of this (without even holding the QBL). You
> > should probably use a BH to switch to the I/O thread.
> good question. the callbacks can only be fired when iscsi_service() is invoked.
> this is currently done by registereing iscsi_process_read/write/flush via qemu_aio_set_fd_handler().
> 
> so the callbacks are invoked in the aio context it seems. can this be right?

Yes, I took another look and I think it's fine.

Kevin

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 10/11] iscsi: ignore aio_discard if unsupported
  2013-07-10 14:28       ` Kevin Wolf
@ 2013-07-10 14:49         ` Peter Lieven
  2013-07-10 14:58           ` Kevin Wolf
  0 siblings, 1 reply; 69+ messages in thread
From: Peter Lieven @ 2013-07-10 14:49 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: pbonzini, ronniesahlberg, qemu-devel, stefanha

Am 10.07.2013 16:28, schrieb Kevin Wolf:
> Am 10.07.2013 um 16:04 hat Peter Lieven geschrieben:
>> Am 10.07.2013 13:33, schrieb Kevin Wolf:
>>> Am 27.06.2013 um 15:11 hat Peter Lieven geschrieben:
>>>> if the target does not support UNMAP or the request
>>>> is too big silently ignore the discard request.
>>>>
>>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>>> Why not loop for the "too big" case? You can probably use the same logic
>>> for unmapping the whole device in .bdrv_create and here.
>> right, but looping in an aio function seemed not so trivial to me.
>> it seems more and more obvious to me that the best would be to change
>> all the remaining aio routines to co routines.
> The pattern for AIO functions is that the real work of submitting
> requests is done in the AIO callback, and it submits new AIO requests
> calling back into the same callback as long as acb->remaining_secs > 0
> (or something like that).
>
> You can still see that kind of thing alive in qed_aio_next_io(), (most
> of?) the rest is converted to coroutines because it makes the code look
> nicer.
would you agree if I leave the easy version in just to fix the potential
problems if iscsi_aio_discard is called with too high nb_sectors or
on a storage where UNMAP is unsupported.

I will add a TODO with the comment that the limit of iscsi->max_unmap should
be replaced by a loop once the routine is replaced by a coroutine?
>
>> in this case i could add the too big logic in iscsi_co_discard and simply call
>> it from iscsi_co_write_zeroes.
> I think that would be the nicest solution.
I promised to take care of this for 1.7.0 latest.

Peter

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 03/11] iscsi: add bdrv_co_is_allocated
  2013-07-10 14:43       ` Kevin Wolf
@ 2013-07-10 14:49         ` Peter Lieven
  2013-07-10 14:54           ` Kevin Wolf
  0 siblings, 1 reply; 69+ messages in thread
From: Peter Lieven @ 2013-07-10 14:49 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: pbonzini, ronniesahlberg, qemu-devel, stefanha

Am 10.07.2013 16:43, schrieb Kevin Wolf:
> Am 10.07.2013 um 15:49 hat Peter Lieven geschrieben:
>> Am 10.07.2013 11:41, schrieb Kevin Wolf:
>>> Am 27.06.2013 um 15:11 hat Peter Lieven geschrieben:
>>>> this patch adds a coroutine for .bdrv_co_is_allocated as well as
>>>> a generic framework that can be used to build coroutines in block/iscsi.
>>>>
>>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>>>> ---
>>>>  block/iscsi.c |  123 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>  1 file changed, 123 insertions(+)
>>>>
>>>> diff --git a/block/iscsi.c b/block/iscsi.c
>>>> index 2e2455d..d31ee95 100644
>>>> --- a/block/iscsi.c
>>>> +++ b/block/iscsi.c
>>>> @@ -58,6 +58,15 @@ typedef struct IscsiLun {
>>>>      uint32_t max_unmap_bdc;
>>>>  } IscsiLun;
>>>>  
>>>> +typedef struct IscsiTask {
>>>> +    int status;
>>>> +    int complete;
>>>> +    int retries;
>>>> +    int do_retry;
>>>> +    struct scsi_task *task;
>>>> +    Coroutine *co;
>>>> +} IscsiTask;
>>>> +
>>>>  typedef struct IscsiAIOCB {
>>>>      BlockDriverAIOCB common;
>>>>      QEMUIOVector *qiov;
>>>> @@ -113,6 +122,45 @@ iscsi_schedule_bh(IscsiAIOCB *acb)
>>>>      qemu_bh_schedule(acb->bh);
>>>>  }
>>>>  
>>>> +static void
>>>> +iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
>>>> +                        void *command_data, void *opaque)
>>>> +{
>>>> +    struct IscsiTask *iTask = opaque;
>>>> +    struct scsi_task *task = command_data;
>>>> +
>>>> +    iTask->complete = 1;
>>>> +    iTask->status = status;
>>>> +    iTask->do_retry = 0;
>>>> +    iTask->task = task;
>>>> +
>>>> +    if (iTask->retries-- > 0 && status == SCSI_STATUS_CHECK_CONDITION
>>>> +        && task->sense.key == SCSI_SENSE_UNIT_ATTENTION) {
>>>> +        iTask->do_retry = 1;
>>>> +        goto out;
>>>> +    }
>>>> +
>>>> +    if (status != SCSI_STATUS_GOOD) {
>>>> +        error_report("iSCSI: Failure. %s", iscsi_get_error(iscsi));
>>>> +        goto out;
>>>> +    }
>>>> +
>>>> +out:
>>>> +    if (iTask->status != SCSI_STATUS_GOOD) {
>>>> +        scsi_free_scsi_task(iTask->task);
>>>> +        iTask->task = NULL;
>>>> +    }
>>>> +    if (iTask->co) {
>>>> +        qemu_coroutine_enter(iTask->co, NULL);
>>>> +    }
>>>> +}
>>> In which context does this callback run? If this is from a separate
>>> thread, you may not do all of this (without even holding the QBL). You
>>> should probably use a BH to switch to the I/O thread.
>> good question. the callbacks can only be fired when iscsi_service() is invoked.
>> this is currently done by registereing iscsi_process_read/write/flush via qemu_aio_set_fd_handler().
>>
>> so the callbacks are invoked in the aio context it seems. can this be right?
> Yes, I took another look and I think it's fine.
Would this still be true if all aio routines are replaced by coroutines?

Peter

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 03/11] iscsi: add bdrv_co_is_allocated
  2013-07-10 14:49         ` Peter Lieven
@ 2013-07-10 14:54           ` Kevin Wolf
  0 siblings, 0 replies; 69+ messages in thread
From: Kevin Wolf @ 2013-07-10 14:54 UTC (permalink / raw)
  To: Peter Lieven; +Cc: pbonzini, ronniesahlberg, qemu-devel, stefanha

Am 10.07.2013 um 16:49 hat Peter Lieven geschrieben:
> Am 10.07.2013 16:43, schrieb Kevin Wolf:
> > Am 10.07.2013 um 15:49 hat Peter Lieven geschrieben:
> >> Am 10.07.2013 11:41, schrieb Kevin Wolf:
> >>> Am 27.06.2013 um 15:11 hat Peter Lieven geschrieben:
> >>>> this patch adds a coroutine for .bdrv_co_is_allocated as well as
> >>>> a generic framework that can be used to build coroutines in block/iscsi.
> >>>>
> >>>> Signed-off-by: Peter Lieven <pl@kamp.de>
> >>>> ---
> >>>>  block/iscsi.c |  123 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>>  1 file changed, 123 insertions(+)
> >>>>
> >>>> diff --git a/block/iscsi.c b/block/iscsi.c
> >>>> index 2e2455d..d31ee95 100644
> >>>> --- a/block/iscsi.c
> >>>> +++ b/block/iscsi.c
> >>>> @@ -58,6 +58,15 @@ typedef struct IscsiLun {
> >>>>      uint32_t max_unmap_bdc;
> >>>>  } IscsiLun;
> >>>>  
> >>>> +typedef struct IscsiTask {
> >>>> +    int status;
> >>>> +    int complete;
> >>>> +    int retries;
> >>>> +    int do_retry;
> >>>> +    struct scsi_task *task;
> >>>> +    Coroutine *co;
> >>>> +} IscsiTask;
> >>>> +
> >>>>  typedef struct IscsiAIOCB {
> >>>>      BlockDriverAIOCB common;
> >>>>      QEMUIOVector *qiov;
> >>>> @@ -113,6 +122,45 @@ iscsi_schedule_bh(IscsiAIOCB *acb)
> >>>>      qemu_bh_schedule(acb->bh);
> >>>>  }
> >>>>  
> >>>> +static void
> >>>> +iscsi_co_generic_cb(struct iscsi_context *iscsi, int status,
> >>>> +                        void *command_data, void *opaque)
> >>>> +{
> >>>> +    struct IscsiTask *iTask = opaque;
> >>>> +    struct scsi_task *task = command_data;
> >>>> +
> >>>> +    iTask->complete = 1;
> >>>> +    iTask->status = status;
> >>>> +    iTask->do_retry = 0;
> >>>> +    iTask->task = task;
> >>>> +
> >>>> +    if (iTask->retries-- > 0 && status == SCSI_STATUS_CHECK_CONDITION
> >>>> +        && task->sense.key == SCSI_SENSE_UNIT_ATTENTION) {
> >>>> +        iTask->do_retry = 1;
> >>>> +        goto out;
> >>>> +    }
> >>>> +
> >>>> +    if (status != SCSI_STATUS_GOOD) {
> >>>> +        error_report("iSCSI: Failure. %s", iscsi_get_error(iscsi));
> >>>> +        goto out;
> >>>> +    }
> >>>> +
> >>>> +out:
> >>>> +    if (iTask->status != SCSI_STATUS_GOOD) {
> >>>> +        scsi_free_scsi_task(iTask->task);
> >>>> +        iTask->task = NULL;
> >>>> +    }
> >>>> +    if (iTask->co) {
> >>>> +        qemu_coroutine_enter(iTask->co, NULL);
> >>>> +    }
> >>>> +}
> >>> In which context does this callback run? If this is from a separate
> >>> thread, you may not do all of this (without even holding the QBL). You
> >>> should probably use a BH to switch to the I/O thread.
> >> good question. the callbacks can only be fired when iscsi_service() is invoked.
> >> this is currently done by registereing iscsi_process_read/write/flush via qemu_aio_set_fd_handler().
> >>
> >> so the callbacks are invoked in the aio context it seems. can this be right?
> > Yes, I took another look and I think it's fine.
> Would this still be true if all aio routines are replaced by coroutines?

Yes, AIO callbacks and coroutines are mostly equivalent from the point
of view of anything external to the block driver itself.

Kevin

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 10/11] iscsi: ignore aio_discard if unsupported
  2013-07-10 14:49         ` Peter Lieven
@ 2013-07-10 14:58           ` Kevin Wolf
  2013-07-10 20:31             ` Peter Lieven
  0 siblings, 1 reply; 69+ messages in thread
From: Kevin Wolf @ 2013-07-10 14:58 UTC (permalink / raw)
  To: Peter Lieven; +Cc: pbonzini, ronniesahlberg, qemu-devel, stefanha

Am 10.07.2013 um 16:49 hat Peter Lieven geschrieben:
> Am 10.07.2013 16:28, schrieb Kevin Wolf:
> > Am 10.07.2013 um 16:04 hat Peter Lieven geschrieben:
> >> Am 10.07.2013 13:33, schrieb Kevin Wolf:
> >>> Am 27.06.2013 um 15:11 hat Peter Lieven geschrieben:
> >>>> if the target does not support UNMAP or the request
> >>>> is too big silently ignore the discard request.
> >>>>
> >>>> Signed-off-by: Peter Lieven <pl@kamp.de>
> >>> Why not loop for the "too big" case? You can probably use the same logic
> >>> for unmapping the whole device in .bdrv_create and here.
> >> right, but looping in an aio function seemed not so trivial to me.
> >> it seems more and more obvious to me that the best would be to change
> >> all the remaining aio routines to co routines.
> > The pattern for AIO functions is that the real work of submitting
> > requests is done in the AIO callback, and it submits new AIO requests
> > calling back into the same callback as long as acb->remaining_secs > 0
> > (or something like that).
> >
> > You can still see that kind of thing alive in qed_aio_next_io(), (most
> > of?) the rest is converted to coroutines because it makes the code look
> > nicer.
> would you agree if I leave the easy version in just to fix the potential
> problems if iscsi_aio_discard is called with too high nb_sectors or
> on a storage where UNMAP is unsupported.
> 
> I will add a TODO with the comment that the limit of iscsi->max_unmap should
> be replaced by a loop once the routine is replaced by a coroutine?

Meh, another pony I don't get... ;-)

Leaving a TODO comment for now is okay with me.

> >> in this case i could add the too big logic in iscsi_co_discard and simply call
> >> it from iscsi_co_write_zeroes.
> > I think that would be the nicest solution.
> I promised to take care of this for 1.7.0 latest.

Okay, thanks.

Kevin

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [Qemu-devel] [PATCHv2 10/11] iscsi: ignore aio_discard if unsupported
  2013-07-10 14:58           ` Kevin Wolf
@ 2013-07-10 20:31             ` Peter Lieven
  0 siblings, 0 replies; 69+ messages in thread
From: Peter Lieven @ 2013-07-10 20:31 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: pbonzini, ronniesahlberg, qemu-devel, stefanha

Am 10.07.2013 16:58, schrieb Kevin Wolf:
> Am 10.07.2013 um 16:49 hat Peter Lieven geschrieben:
>> Am 10.07.2013 16:28, schrieb Kevin Wolf:
>>> Am 10.07.2013 um 16:04 hat Peter Lieven geschrieben:
>>>> Am 10.07.2013 13:33, schrieb Kevin Wolf:
>>>>> Am 27.06.2013 um 15:11 hat Peter Lieven geschrieben:
>>>>>> if the target does not support UNMAP or the request
>>>>>> is too big silently ignore the discard request.
>>>>>>
>>>>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>>>>> Why not loop for the "too big" case? You can probably use the same logic
>>>>> for unmapping the whole device in .bdrv_create and here.
>>>> right, but looping in an aio function seemed not so trivial to me.
>>>> it seems more and more obvious to me that the best would be to change
>>>> all the remaining aio routines to co routines.
>>> The pattern for AIO functions is that the real work of submitting
>>> requests is done in the AIO callback, and it submits new AIO requests
>>> calling back into the same callback as long as acb->remaining_secs > 0
>>> (or something like that).
>>>
>>> You can still see that kind of thing alive in qed_aio_next_io(), (most
>>> of?) the rest is converted to coroutines because it makes the code look
>>> nicer.
>> would you agree if I leave the easy version in just to fix the potential
>> problems if iscsi_aio_discard is called with too high nb_sectors or
>> on a storage where UNMAP is unsupported.
>>
>> I will add a TODO with the comment that the limit of iscsi->max_unmap should
>> be replaced by a loop once the routine is replaced by a coroutine?
> Meh, another pony I don't get... ;-)
>
> Leaving a TODO comment for now is okay with me.
forget about the TODO. Looking at what I have already got, implementing
iscsi_co_discard is low hanging fruit. I will implement it directly. With the
too big loop and all the benefits of the co routines.

Peter

^ permalink raw reply	[flat|nested] 69+ messages in thread

end of thread, other threads:[~2013-07-10 20:31 UTC | newest]

Thread overview: 69+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-27 13:11 [Qemu-devel] [PATCHv2 00/11] iscsi/qemu-img/block-migration enhancements Peter Lieven
2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 01/11] iscsi: add logical block provisioning information to iscsilun Peter Lieven
2013-07-01 13:35   ` Stefan Hajnoczi
2013-07-01 16:08     ` Peter Lieven
2013-07-10  9:19   ` Kevin Wolf
2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 02/11] iscsi: read unmap info from block limits vpd page Peter Lieven
2013-07-03  3:43   ` ronnie sahlberg
2013-07-03 21:23     ` Peter Lieven
2013-07-04 12:37       ` Paolo Bonzini
2013-07-04 21:07         ` Peter Lieven
2013-07-05  6:28           ` Paolo Bonzini
2013-07-05  7:11           ` ronnie sahlberg
2013-07-06 22:15             ` Peter Lieven
2013-07-06 23:23               ` ronnie sahlberg
2013-07-10  9:23   ` Kevin Wolf
2013-07-10  9:25   ` Kevin Wolf
2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 03/11] iscsi: add bdrv_co_is_allocated Peter Lieven
2013-07-01 13:46   ` Stefan Hajnoczi
2013-07-01 16:00     ` Peter Lieven
2013-07-10  9:41   ` Kevin Wolf
2013-07-10 13:49     ` Peter Lieven
2013-07-10 14:43       ` Kevin Wolf
2013-07-10 14:49         ` Peter Lieven
2013-07-10 14:54           ` Kevin Wolf
2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 04/11] iscsi: add bdrv_co_write_zeroes Peter Lieven
2013-07-10  9:54   ` Kevin Wolf
2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 05/11] block: add bdrv_write_zeroes() Peter Lieven
2013-07-10  9:56   ` Kevin Wolf
2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 06/11] block/raw: add bdrv_co_write_zeroes Peter Lieven
2013-07-10  9:57   ` Kevin Wolf
2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 07/11] iscsi: let bdrv_create conditionally zero out the device Peter Lieven
2013-07-01 13:58   ` Stefan Hajnoczi
2013-07-01 20:20   ` Paolo Bonzini
2013-07-01 21:36     ` Peter Lieven
2013-07-02  9:22       ` Paolo Bonzini
2013-07-02 10:36         ` Peter Lieven
2013-07-02 10:49           ` Paolo Bonzini
2013-07-02 10:56             ` Peter Lieven
2013-07-02 11:04               ` Paolo Bonzini
2013-07-02 11:18                 ` Peter Lieven
2013-07-10 10:14   ` Kevin Wolf
2013-07-10 13:52     ` Peter Lieven
2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 08/11] block-migration: efficiently encode zero blocks Peter Lieven
2013-07-01 14:13   ` Stefan Hajnoczi
2013-07-01 15:55     ` Peter Lieven
2013-07-02  7:40       ` Stefan Hajnoczi
2013-07-02 10:51         ` Paolo Bonzini
2013-07-01 16:09     ` Peter Lieven
2013-07-02  7:36       ` Stefan Hajnoczi
2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 09/11] iscsi: factor out sector conversions Peter Lieven
2013-07-10 11:29   ` Kevin Wolf
2013-07-10 14:07     ` Peter Lieven
2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 10/11] iscsi: ignore aio_discard if unsupported Peter Lieven
2013-07-10 11:33   ` Kevin Wolf
2013-07-10 14:04     ` Peter Lieven
2013-07-10 14:28       ` Kevin Wolf
2013-07-10 14:49         ` Peter Lieven
2013-07-10 14:58           ` Kevin Wolf
2013-07-10 20:31             ` Peter Lieven
2013-06-27 13:11 ` [Qemu-devel] [PATCHv2 11/11] iscsi: assert that sectors are aligned to LUN blocksize Peter Lieven
2013-07-01 14:35   ` Stefan Hajnoczi
2013-07-01 15:59     ` Peter Lieven
2013-07-02  7:44       ` Stefan Hajnoczi
2013-07-02  8:28         ` Peter Lieven
2013-07-02 10:44           ` Paolo Bonzini
2013-07-02 10:49             ` Peter Lieven
2013-07-02 10:53               ` Paolo Bonzini
2013-07-10 11:38   ` Kevin Wolf
2013-07-10 14:02     ` Peter Lieven

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.