All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation
@ 2013-12-11 21:08 Kevin Wolf
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 01/22] block: Move initialisation of BlockLimits to bdrv_refresh_limits() Kevin Wolf
                   ` (22 more replies)
  0 siblings, 23 replies; 39+ messages in thread
From: Kevin Wolf @ 2013-12-11 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, pl, stefanha

This patch series adds code to the block layer that allows performing
I/O requests in smaller granularities than required by the host backend
(most importantly, O_DIRECT restrictions). It achieves this for reads
by rounding the request to host-side block boundary, and for writes by
performing a read-modify-write cycle (and serialising requests
touching the same block so that the RMW doesn't write back stale data).

Originally I intended to reuse a lot of code from Paolo's previous
patch series, however as I tried to integrate pread/pwrite, which
already do a very similar thing (except for considering concurrency),
and because I wanted to implement zero-copy, most of this series ended
up being new code.

Zero-copy is possible in a common case because while XFS defauls to a
4k sector size and therefore 4k on-disk O_DIRECT alignment for 512E
disks, it still only has a 512 byte memory alignment requirement.
(Unfortunately the XFS_IOC_DIOINFO ioctl claims 4k even for memory, but
we know that the value is wrong and can probe it.)


Changes since RFC:
- Moved opt_mem_alignment into BlockLimits [Paolo]

- Changed BlockLimits in turn to work a bit more like the
  .bdrv_opt_mem_align() callback of the RFC; allows updating the
  BlockLimits later when the chain changes or bdrv_reopen() toggles
  O_DIRECT

- Fixed a typo in a commit message [Eric]


Kevin Wolf (20):
  block: Move initialisation of BlockLimits to bdrv_refresh_limits()
  block: Inherit opt_transfer_length
  block: Update BlockLimits when they might have changed
  qemu_memalign: Allow small alignments
  block: Detect unaligned length in bdrv_qiov_is_aligned()
  block: Don't use guest sector size for qemu_blockalign()
  block: Introduce bdrv_aligned_preadv()
  block: Introduce bdrv_co_do_preadv()
  block: Introduce bdrv_aligned_pwritev()
  block: write: Handle COR dependency after I/O throttling
  block: Introduce bdrv_co_do_pwritev()
  block: Switch BdrvTrackedRequest to byte granularity
  block: Allow waiting for overlapping requests between begin/end
  block: Make zero-after-EOF work with larger alignment
  block: Generalise and optimise COR serialisation
  block: Make overlap range for serialisation dynamic
  block: Align requests in bdrv_co_do_pwritev()
  block: Change coroutine wrapper to byte granularity
  block: Make bdrv_pread() a bdrv_prwv_co() wrapper
  block: Make bdrv_pwrite() a bdrv_prwv_co() wrapper

Paolo Bonzini (2):
  block: rename buffer_alignment to guest_block_size
  raw: Probe required direct I/O alignment

 block.c                   | 602 +++++++++++++++++++++++++++++++---------------
 block/backup.c            |   7 +-
 block/iscsi.c             |  87 ++++---
 block/qcow2.c             |  11 +-
 block/qed.c               |  11 +-
 block/raw-posix.c         | 102 ++++++--
 block/raw-win32.c         |  41 ++++
 block/stream.c            |   2 +
 block/vmdk.c              |  22 +-
 hw/block/virtio-blk.c     |   2 +-
 hw/ide/core.c             |   2 +-
 hw/scsi/scsi-disk.c       |   2 +-
 hw/scsi/scsi-generic.c    |   2 +-
 include/block/block.h     |   6 +-
 include/block/block_int.h |  25 +-
 util/oslib-posix.c        |   5 +
 16 files changed, 666 insertions(+), 263 deletions(-)

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 01/22] block: Move initialisation of BlockLimits to bdrv_refresh_limits()
  2013-12-11 21:08 [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation Kevin Wolf
@ 2013-12-11 21:08 ` Kevin Wolf
  2013-12-12  2:54   ` Wenchao Xia
  2013-12-12  6:17   ` Peter Lieven
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 02/22] block: Inherit opt_transfer_length Kevin Wolf
                   ` (21 subsequent siblings)
  22 siblings, 2 replies; 39+ messages in thread
From: Kevin Wolf @ 2013-12-11 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, pl, stefanha

This function separates filling the BlockLimits from bdrv_open(), which
allows it to call it from other operations which may change the limits
(e.g. modifications to the backing file chain or bdrv_reopen)

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c                   | 21 ++++++++++++
 block/iscsi.c             | 87 ++++++++++++++++++++++++++++-------------------
 block/qcow2.c             | 11 +++++-
 block/qed.c               | 11 +++++-
 block/vmdk.c              | 22 +++++++++---
 include/block/block_int.h |  2 ++
 6 files changed, 113 insertions(+), 41 deletions(-)

diff --git a/block.c b/block.c
index 13f001a..c32d856 100644
--- a/block.c
+++ b/block.c
@@ -479,6 +479,21 @@ int bdrv_create_file(const char* filename, QEMUOptionParameter *options,
     return ret;
 }
 
+static int bdrv_refresh_limits(BlockDriverState *bs)
+{
+    BlockDriver *drv = bs->drv;
+
+    memset(&bs->bl, 0, sizeof(bs->bl));
+
+    if (!drv) {
+        return 0;
+    } else if (drv->bdrv_refresh_limits) {
+        return drv->bdrv_refresh_limits(bs);
+    }
+
+    return 0;
+}
+
 /*
  * Create a uniquely-named empty temporary file.
  * Return 0 upon success, otherwise a negative errno value.
@@ -833,6 +848,8 @@ static int bdrv_open_common(BlockDriverState *bs, BlockDriverState *file,
         goto free_and_fail;
     }
 
+    bdrv_refresh_limits(bs);
+
 #ifndef _WIN32
     if (bs->is_temporary) {
         assert(bs->filename[0] != '\0');
@@ -1018,6 +1035,10 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *options, Error **errp)
     }
     pstrcpy(bs->backing_file, sizeof(bs->backing_file),
             bs->backing_hd->file->filename);
+
+    /* Recalculate the BlockLimits with the backing file */
+    bdrv_refresh_limits(bs);
+
     return 0;
 }
 
diff --git a/block/iscsi.c b/block/iscsi.c
index 829d444..f3ded8c 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1425,6 +1425,56 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
         task = NULL;
     }
 
+#if defined(LIBISCSI_FEATURE_NOP_COUNTER)
+    /* Set up a timer for sending out iSCSI NOPs */
+    iscsilun->nop_timer = timer_new_ms(QEMU_CLOCK_REALTIME, iscsi_nop_timed_event, iscsilun);
+    timer_mod(iscsilun->nop_timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + NOP_INTERVAL);
+#endif
+
+out:
+    qemu_opts_del(opts);
+    if (initiator_name != NULL) {
+        g_free(initiator_name);
+    }
+    if (iscsi_url != NULL) {
+        iscsi_destroy_url(iscsi_url);
+    }
+    if (task != NULL) {
+        scsi_free_scsi_task(task);
+    }
+
+    if (ret) {
+        if (iscsi != NULL) {
+            iscsi_destroy_context(iscsi);
+        }
+        memset(iscsilun, 0, sizeof(IscsiLun));
+    }
+    return ret;
+}
+
+static void iscsi_close(BlockDriverState *bs)
+{
+    IscsiLun *iscsilun = bs->opaque;
+    struct iscsi_context *iscsi = iscsilun->iscsi;
+
+    if (iscsilun->nop_timer) {
+        timer_del(iscsilun->nop_timer);
+        timer_free(iscsilun->nop_timer);
+    }
+    qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), NULL, NULL, NULL);
+    iscsi_destroy_context(iscsi);
+    g_free(iscsilun->zeroblock);
+    memset(iscsilun, 0, sizeof(IscsiLun));
+}
+
+static int iscsi_refresh_limits(BlockDriverState *bs)
+{
+    IscsiLun *iscsilun = bs->opaque;
+    struct scsi_task *task = NULL;
+    int ret;
+
+    memset(&bs->bl, 0, sizeof(bs->bl));
+
     if (iscsilun->lbp.lbpu || iscsilun->lbp.lbpws) {
         struct scsi_inquiry_block_limits *inq_bl;
         task = iscsi_do_inquiry(iscsilun->iscsi, iscsilun->lun, 1,
@@ -1462,48 +1512,14 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
                                                      iscsilun);
     }
 
-#if defined(LIBISCSI_FEATURE_NOP_COUNTER)
-    /* Set up a timer for sending out iSCSI NOPs */
-    iscsilun->nop_timer = timer_new_ms(QEMU_CLOCK_REALTIME, iscsi_nop_timed_event, iscsilun);
-    timer_mod(iscsilun->nop_timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + NOP_INTERVAL);
-#endif
-
+    ret = 0;
 out:
-    qemu_opts_del(opts);
-    if (initiator_name != NULL) {
-        g_free(initiator_name);
-    }
-    if (iscsi_url != NULL) {
-        iscsi_destroy_url(iscsi_url);
-    }
     if (task != NULL) {
         scsi_free_scsi_task(task);
     }
-
-    if (ret) {
-        if (iscsi != NULL) {
-            iscsi_destroy_context(iscsi);
-        }
-        memset(iscsilun, 0, sizeof(IscsiLun));
-    }
     return ret;
 }
 
-static void iscsi_close(BlockDriverState *bs)
-{
-    IscsiLun *iscsilun = bs->opaque;
-    struct iscsi_context *iscsi = iscsilun->iscsi;
-
-    if (iscsilun->nop_timer) {
-        timer_del(iscsilun->nop_timer);
-        timer_free(iscsilun->nop_timer);
-    }
-    qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), NULL, NULL, NULL);
-    iscsi_destroy_context(iscsi);
-    g_free(iscsilun->zeroblock);
-    memset(iscsilun, 0, sizeof(IscsiLun));
-}
-
 static int iscsi_truncate(BlockDriverState *bs, int64_t offset)
 {
     IscsiLun *iscsilun = bs->opaque;
@@ -1616,6 +1632,7 @@ static BlockDriver bdrv_iscsi = {
     .bdrv_getlength  = iscsi_getlength,
     .bdrv_get_info   = iscsi_get_info,
     .bdrv_truncate   = iscsi_truncate,
+    .bdrv_refresh_limits = iscsi_refresh_limits,
 
 #if defined(LIBISCSI_FEATURE_IOVECTOR)
     .bdrv_co_get_block_status = iscsi_co_get_block_status,
diff --git a/block/qcow2.c b/block/qcow2.c
index f29aa88..f40355c 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -718,7 +718,6 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
     }
 
     qemu_opts_del(opts);
-    bs->bl.write_zeroes_alignment = s->cluster_sectors;
 
     if (s->use_lazy_refcounts && s->qcow_version < 3) {
         error_setg(errp, "Lazy refcounts require a qcow2 image with at least "
@@ -751,6 +750,15 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
     return ret;
 }
 
+static int qcow2_refresh_limits(BlockDriverState *bs)
+{
+    BDRVQcowState *s = bs->opaque;
+
+    bs->bl.write_zeroes_alignment = s->cluster_sectors;
+
+    return 0;
+}
+
 static int qcow2_set_key(BlockDriverState *bs, const char *key)
 {
     BDRVQcowState *s = bs->opaque;
@@ -2268,6 +2276,7 @@ static BlockDriver bdrv_qcow2 = {
 
     .bdrv_change_backing_file   = qcow2_change_backing_file,
 
+    .bdrv_refresh_limits        = qcow2_refresh_limits,
     .bdrv_invalidate_cache      = qcow2_invalidate_cache,
 
     .create_options = qcow2_create_options,
diff --git a/block/qed.c b/block/qed.c
index 450a1fa..d35bd1c 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -495,7 +495,6 @@ static int bdrv_qed_open(BlockDriverState *bs, QDict *options, int flags,
         }
     }
 
-    bs->bl.write_zeroes_alignment = s->header.cluster_size >> BDRV_SECTOR_BITS;
     s->need_check_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
                                             qed_need_check_timer_cb, s);
 
@@ -507,6 +506,15 @@ out:
     return ret;
 }
 
+static int bdrv_qed_refresh_limits(BlockDriverState *bs)
+{
+    BDRVQEDState *s = bs->opaque;
+
+    bs->bl.write_zeroes_alignment = s->header.cluster_size >> BDRV_SECTOR_BITS;
+
+    return 0;
+}
+
 /* We have nothing to do for QED reopen, stubs just return
  * success */
 static int bdrv_qed_reopen_prepare(BDRVReopenState *state,
@@ -1616,6 +1624,7 @@ static BlockDriver bdrv_qed = {
     .bdrv_truncate            = bdrv_qed_truncate,
     .bdrv_getlength           = bdrv_qed_getlength,
     .bdrv_get_info            = bdrv_qed_get_info,
+    .bdrv_refresh_limits      = bdrv_qed_refresh_limits,
     .bdrv_change_backing_file = bdrv_qed_change_backing_file,
     .bdrv_invalidate_cache    = bdrv_qed_invalidate_cache,
     .bdrv_check               = bdrv_qed_check,
diff --git a/block/vmdk.c b/block/vmdk.c
index 0734bc2..f8a387f 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -428,10 +428,6 @@ static int vmdk_add_extent(BlockDriverState *bs,
     extent->l2_size = l2_size;
     extent->cluster_sectors = flat ? sectors : cluster_sectors;
 
-    if (!flat) {
-        bs->bl.write_zeroes_alignment =
-            MAX(bs->bl.write_zeroes_alignment, cluster_sectors);
-    }
     if (s->num_extents > 1) {
         extent->end_sector = (*(extent - 1)).end_sector + extent->sectors;
     } else {
@@ -886,6 +882,23 @@ fail:
     return ret;
 }
 
+
+static int vmdk_refresh_limits(BlockDriverState *bs)
+{
+    BDRVVmdkState *s = bs->opaque;
+    int i;
+
+    for (i = 0; i < s->num_extents; i++) {
+        if (!s->extents[i].flat) {
+            bs->bl.write_zeroes_alignment =
+                MAX(bs->bl.write_zeroes_alignment,
+                    s->extents[i].cluster_sectors);
+        }
+    }
+
+    return 0;
+}
+
 static int get_whole_cluster(BlockDriverState *bs,
                 VmdkExtent *extent,
                 uint64_t cluster_offset,
@@ -1971,6 +1984,7 @@ static BlockDriver bdrv_vmdk = {
     .bdrv_get_allocated_file_size = vmdk_get_allocated_file_size,
     .bdrv_has_zero_init           = vmdk_has_zero_init,
     .bdrv_get_specific_info       = vmdk_get_specific_info,
+    .bdrv_refresh_limits          = vmdk_refresh_limits,
 
     .create_options               = vmdk_create_options,
 };
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 8b132d7..c49fa6b 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -226,6 +226,8 @@ struct BlockDriver {
     int (*bdrv_debug_resume)(BlockDriverState *bs, const char *tag);
     bool (*bdrv_debug_is_suspended)(BlockDriverState *bs, const char *tag);
 
+    int (*bdrv_refresh_limits)(BlockDriverState *bs);
+
     /*
      * Returns 1 if newly created images are guaranteed to contain only
      * zeros, 0 otherwise.
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 02/22] block: Inherit opt_transfer_length
  2013-12-11 21:08 [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation Kevin Wolf
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 01/22] block: Move initialisation of BlockLimits to bdrv_refresh_limits() Kevin Wolf
@ 2013-12-11 21:08 ` Kevin Wolf
  2013-12-12  3:02   ` Wenchao Xia
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 03/22] block: Update BlockLimits when they might have changed Kevin Wolf
                   ` (20 subsequent siblings)
  22 siblings, 1 reply; 39+ messages in thread
From: Kevin Wolf @ 2013-12-11 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, pl, stefanha

When there is a format driver between the backend, it's not guaranteed
that exposing the opt_transfer_length for the format driver results in
the optimal requests (because of fragmentation etc.), but it can't make
things worse, so let's just do it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index c32d856..79a325f 100644
--- a/block.c
+++ b/block.c
@@ -487,7 +487,23 @@ static int bdrv_refresh_limits(BlockDriverState *bs)
 
     if (!drv) {
         return 0;
-    } else if (drv->bdrv_refresh_limits) {
+    }
+
+    /* Take some limits from the children as a default */
+    if (bs->file) {
+        bdrv_refresh_limits(bs->file);
+        bs->bl.opt_transfer_length = bs->file->bl.opt_transfer_length;
+    }
+
+    if (bs->backing_hd) {
+        bdrv_refresh_limits(bs->backing_hd);
+        bs->bl.opt_transfer_length =
+            MAX(bs->bl.opt_transfer_length,
+                bs->backing_hd->bl.opt_transfer_length);
+    }
+
+    /* Then let the driver override it */
+    if (drv->bdrv_refresh_limits) {
         return drv->bdrv_refresh_limits(bs);
     }
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 03/22] block: Update BlockLimits when they might have changed
  2013-12-11 21:08 [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation Kevin Wolf
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 01/22] block: Move initialisation of BlockLimits to bdrv_refresh_limits() Kevin Wolf
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 02/22] block: Inherit opt_transfer_length Kevin Wolf
@ 2013-12-11 21:08 ` Kevin Wolf
  2013-12-12  3:09   ` Wenchao Xia
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 04/22] qemu_memalign: Allow small alignments Kevin Wolf
                   ` (19 subsequent siblings)
  22 siblings, 1 reply; 39+ messages in thread
From: Kevin Wolf @ 2013-12-11 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, pl, stefanha

When reopening with different flags, or when backing files disappear
from the chain, the limits may change. Make sure they get updated in
these cases.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c               | 5 ++++-
 block/stream.c        | 2 ++
 include/block/block.h | 1 +
 3 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 79a325f..8eae359 100644
--- a/block.c
+++ b/block.c
@@ -479,7 +479,7 @@ int bdrv_create_file(const char* filename, QEMUOptionParameter *options,
     return ret;
 }
 
-static int bdrv_refresh_limits(BlockDriverState *bs)
+int bdrv_refresh_limits(BlockDriverState *bs)
 {
     BlockDriver *drv = bs->drv;
 
@@ -1464,6 +1464,8 @@ void bdrv_reopen_commit(BDRVReopenState *reopen_state)
     reopen_state->bs->enable_write_cache = !!(reopen_state->flags &
                                               BDRV_O_CACHE_WB);
     reopen_state->bs->read_only = !(reopen_state->flags & BDRV_O_RDWR);
+
+    bdrv_refresh_limits(reopen_state->bs);
 }
 
 /*
@@ -2261,6 +2263,7 @@ int bdrv_drop_intermediate(BlockDriverState *active, BlockDriverState *top,
     }
     new_top_bs->backing_hd = base_bs;
 
+    bdrv_refresh_limits(new_top_bs);
 
     QSIMPLEQ_FOREACH_SAFE(intermediate_state, &states_to_delete, entry, next) {
         /* so that bdrv_close() does not recursively close the chain */
diff --git a/block/stream.c b/block/stream.c
index 46bec7d..dd0b4ac 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -75,6 +75,8 @@ static void close_unused_images(BlockDriverState *top, BlockDriverState *base,
         unused->backing_hd = NULL;
         bdrv_unref(unused);
     }
+
+    bdrv_refresh_limits(top);
 }
 
 static void coroutine_fn stream_run(void *opaque)
diff --git a/include/block/block.h b/include/block/block.h
index 36efaea..64fb319 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -249,6 +249,7 @@ int bdrv_truncate(BlockDriverState *bs, int64_t offset);
 int64_t bdrv_getlength(BlockDriverState *bs);
 int64_t bdrv_get_allocated_file_size(BlockDriverState *bs);
 void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr);
+int bdrv_refresh_limits(BlockDriverState *bs);
 int bdrv_commit(BlockDriverState *bs);
 int bdrv_commit_all(void);
 int bdrv_change_backing_file(BlockDriverState *bs,
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 04/22] qemu_memalign: Allow small alignments
  2013-12-11 21:08 [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation Kevin Wolf
                   ` (2 preceding siblings ...)
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 03/22] block: Update BlockLimits when they might have changed Kevin Wolf
@ 2013-12-11 21:08 ` Kevin Wolf
  2013-12-12  3:09   ` Wenchao Xia
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 05/22] block: Detect unaligned length in bdrv_qiov_is_aligned() Kevin Wolf
                   ` (18 subsequent siblings)
  22 siblings, 1 reply; 39+ messages in thread
From: Kevin Wolf @ 2013-12-11 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, pl, stefanha

The functions used by qemu_memalign() require an alignment that is at
least sizeof(void*). Adjust it if it is too small.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 util/oslib-posix.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index e00a44c..54f8932 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -85,6 +85,11 @@ void *qemu_oom_check(void *ptr)
 void *qemu_memalign(size_t alignment, size_t size)
 {
     void *ptr;
+
+    if (alignment < sizeof(void*)) {
+        alignment = sizeof(void*);
+    }
+
 #if defined(_POSIX_C_SOURCE) && !defined(__sun__)
     int ret;
     ret = posix_memalign(&ptr, alignment, size);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 05/22] block: Detect unaligned length in bdrv_qiov_is_aligned()
  2013-12-11 21:08 [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation Kevin Wolf
                   ` (3 preceding siblings ...)
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 04/22] qemu_memalign: Allow small alignments Kevin Wolf
@ 2013-12-11 21:08 ` Kevin Wolf
  2013-12-12  3:15   ` Wenchao Xia
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 06/22] block: Don't use guest sector size for qemu_blockalign() Kevin Wolf
                   ` (17 subsequent siblings)
  22 siblings, 1 reply; 39+ messages in thread
From: Kevin Wolf @ 2013-12-11 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, pl, stefanha

For an O_DIRECT request to succeed, it's not only necessary that all
base addresses in the qiov are aligned, but also that each length in it
is aligned.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block.c b/block.c
index 8eae359..62c5a75 100644
--- a/block.c
+++ b/block.c
@@ -4561,6 +4561,9 @@ bool bdrv_qiov_is_aligned(BlockDriverState *bs, QEMUIOVector *qiov)
         if ((uintptr_t) qiov->iov[i].iov_base % bs->buffer_alignment) {
             return false;
         }
+        if (qiov->iov[i].iov_len % bs->buffer_alignment) {
+            return false;
+        }
     }
 
     return true;
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 06/22] block: Don't use guest sector size for qemu_blockalign()
  2013-12-11 21:08 [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation Kevin Wolf
                   ` (4 preceding siblings ...)
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 05/22] block: Detect unaligned length in bdrv_qiov_is_aligned() Kevin Wolf
@ 2013-12-11 21:08 ` Kevin Wolf
  2013-12-12  6:32   ` Wenchao Xia
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 07/22] block: rename buffer_alignment to guest_block_size Kevin Wolf
                   ` (16 subsequent siblings)
  22 siblings, 1 reply; 39+ messages in thread
From: Kevin Wolf @ 2013-12-11 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, pl, stefanha

bs->buffer_alignment is set by the device emulation and contains the
logical block size of the guest device. This isn't something that the
block layer should know, and even less something to use for determining
the right alignment of buffers to be used for the host.

The new BlockLimits field opt_mem_alignment tells the qemu block layer
the optimal alignment to be used so that no bounce buffer must be used
in the driver.

This patch may change the buffer alignment from 4k to 512 for all
callers that used qemu_blockalign() with the top-level image format
BlockDriverState. The value was never propagated to other levels in the
tree, so in particular raw-posix never required anything else than 512.

While on disks with 4k sectors direct I/O requires a 4k alignment,
memory may still be okay when aligned to 512 byte boundaries. This is
what must have happened in practice, because otherwise this would
already have failed earlier. Therefore I don't expect regressions even
with this intermediate state. Later, raw-posix can implement the hook
and expose a different memory alignment requirement.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c                   | 23 ++++++++++++++++++++---
 include/block/block.h     |  3 +++
 include/block/block_int.h |  3 +++
 3 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index 62c5a75..d50a4e4 100644
--- a/block.c
+++ b/block.c
@@ -214,6 +214,16 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
     qemu_co_queue_next(&bs->throttled_reqs[is_write]);
 }
 
+size_t bdrv_opt_mem_align(BlockDriverState *bs)
+{
+    if (!bs || !bs->drv) {
+        /* 4k should be on the safe side */
+        return 4096;
+    }
+
+    return bs->bl.opt_mem_alignment;
+}
+
 /* check if the path starts with "<protocol>:" */
 static int path_has_protocol(const char *path)
 {
@@ -493,6 +503,9 @@ int bdrv_refresh_limits(BlockDriverState *bs)
     if (bs->file) {
         bdrv_refresh_limits(bs->file);
         bs->bl.opt_transfer_length = bs->file->bl.opt_transfer_length;
+        bs->bl.opt_mem_alignment = bs->file->bl.opt_mem_alignment;
+    } else {
+        bs->bl.opt_mem_alignment = 512;
     }
 
     if (bs->backing_hd) {
@@ -500,6 +513,9 @@ int bdrv_refresh_limits(BlockDriverState *bs)
         bs->bl.opt_transfer_length =
             MAX(bs->bl.opt_transfer_length,
                 bs->backing_hd->bl.opt_transfer_length);
+        bs->bl.opt_mem_alignment =
+            MAX(bs->bl.opt_mem_alignment,
+                bs->backing_hd->bl.opt_mem_alignment);
     }
 
     /* Then let the driver override it */
@@ -4547,7 +4563,7 @@ void bdrv_set_buffer_alignment(BlockDriverState *bs, int align)
 
 void *qemu_blockalign(BlockDriverState *bs, size_t size)
 {
-    return qemu_memalign((bs && bs->buffer_alignment) ? bs->buffer_alignment : 512, size);
+    return qemu_memalign(bdrv_opt_mem_align(bs), size);
 }
 
 /*
@@ -4556,12 +4572,13 @@ void *qemu_blockalign(BlockDriverState *bs, size_t size)
 bool bdrv_qiov_is_aligned(BlockDriverState *bs, QEMUIOVector *qiov)
 {
     int i;
+    size_t alignment = bdrv_opt_mem_align(bs);
 
     for (i = 0; i < qiov->niov; i++) {
-        if ((uintptr_t) qiov->iov[i].iov_base % bs->buffer_alignment) {
+        if ((uintptr_t) qiov->iov[i].iov_base % alignment) {
             return false;
         }
-        if (qiov->iov[i].iov_len % bs->buffer_alignment) {
+        if (qiov->iov[i].iov_len % alignment) {
             return false;
         }
     }
diff --git a/include/block/block.h b/include/block/block.h
index 64fb319..cf63ee2 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -419,6 +419,9 @@ void bdrv_img_create(const char *filename, const char *fmt,
                      char *options, uint64_t img_size, int flags,
                      Error **errp, bool quiet);
 
+/* Returns the alignment in bytes that is required so that no bounce buffer
+ * is required throughout the stack */
+size_t bdrv_opt_mem_align(BlockDriverState *bs);
 void bdrv_set_buffer_alignment(BlockDriverState *bs, int align);
 void *qemu_blockalign(BlockDriverState *bs, size_t size);
 bool bdrv_qiov_is_aligned(BlockDriverState *bs, QEMUIOVector *qiov);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index c49fa6b..6ffae7c 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -252,6 +252,9 @@ typedef struct BlockLimits {
 
     /* optimal transfer length in sectors */
     int opt_transfer_length;
+
+    /* memory alignment so that no bounce buffer is needed */
+    size_t opt_mem_alignment;
 } BlockLimits;
 
 /*
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 07/22] block: rename buffer_alignment to guest_block_size
  2013-12-11 21:08 [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation Kevin Wolf
                   ` (5 preceding siblings ...)
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 06/22] block: Don't use guest sector size for qemu_blockalign() Kevin Wolf
@ 2013-12-11 21:08 ` Kevin Wolf
  2013-12-12  6:34   ` Wenchao Xia
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 08/22] raw: Probe required direct I/O alignment Kevin Wolf
                   ` (15 subsequent siblings)
  22 siblings, 1 reply; 39+ messages in thread
From: Kevin Wolf @ 2013-12-11 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, pl, stefanha

From: Paolo Bonzini <pbonzini@redhat.com>

The alignment field is now set to the value that is promised to the
guest, rather than required by the host.  The next patches will make
QEMU aware of the host-provided values, so make this clear.

The alignment is also not about memory buffers, but about the sectors on
the disk, change the documentation of the field.

At this point, the field is set by the device emulation, but completely
ignored by the block layer.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c                   | 10 +++++-----
 hw/block/virtio-blk.c     |  2 +-
 hw/ide/core.c             |  2 +-
 hw/scsi/scsi-disk.c       |  2 +-
 hw/scsi/scsi-generic.c    |  2 +-
 include/block/block.h     |  2 +-
 include/block/block_int.h |  4 ++--
 7 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/block.c b/block.c
index d50a4e4..3504d17 100644
--- a/block.c
+++ b/block.c
@@ -812,7 +812,7 @@ static int bdrv_open_common(BlockDriverState *bs, BlockDriverState *file,
     }
 
     bs->open_flags = flags;
-    bs->buffer_alignment = 512;
+    bs->guest_block_size = 512;
     bs->zero_beyond_eof = true;
     open_flags = bdrv_open_flags(bs, flags);
     bs->read_only = !(open_flags & BDRV_O_RDWR);
@@ -1648,7 +1648,7 @@ static void bdrv_move_feature_fields(BlockDriverState *bs_dest,
     bs_dest->dev_ops            = bs_src->dev_ops;
     bs_dest->dev_opaque         = bs_src->dev_opaque;
     bs_dest->dev                = bs_src->dev;
-    bs_dest->buffer_alignment   = bs_src->buffer_alignment;
+    bs_dest->guest_block_size   = bs_src->guest_block_size;
     bs_dest->copy_on_read       = bs_src->copy_on_read;
 
     bs_dest->enable_write_cache = bs_src->enable_write_cache;
@@ -1800,7 +1800,7 @@ void bdrv_detach_dev(BlockDriverState *bs, void *dev)
     bs->dev = NULL;
     bs->dev_ops = NULL;
     bs->dev_opaque = NULL;
-    bs->buffer_alignment = 512;
+    bs->guest_block_size = 512;
 }
 
 /* TODO change to return DeviceState * when all users are qdevified */
@@ -4556,9 +4556,9 @@ BlockDriverAIOCB *bdrv_aio_ioctl(BlockDriverState *bs,
     return NULL;
 }
 
-void bdrv_set_buffer_alignment(BlockDriverState *bs, int align)
+void bdrv_set_guest_block_size(BlockDriverState *bs, int align)
 {
-    bs->buffer_alignment = align;
+    bs->guest_block_size = align;
 }
 
 void *qemu_blockalign(BlockDriverState *bs, size_t size)
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 13f6d82..323e9ec 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -720,7 +720,7 @@ static int virtio_blk_device_init(VirtIODevice *vdev)
     register_savevm(qdev, "virtio-blk", virtio_blk_id++, 2,
                     virtio_blk_save, virtio_blk_load, s);
     bdrv_set_dev_ops(s->bs, &virtio_block_ops, s);
-    bdrv_set_buffer_alignment(s->bs, s->conf->logical_block_size);
+    bdrv_set_guest_block_size(s->bs, s->conf->logical_block_size);
 
     bdrv_iostatus_enable(s->bs);
 
diff --git a/hw/ide/core.c b/hw/ide/core.c
index e1f4c33..036cd4a 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -2103,7 +2103,7 @@ int ide_init_drive(IDEState *s, BlockDriverState *bs, IDEDriveKind kind,
     s->smart_selftest_count = 0;
     if (kind == IDE_CD) {
         bdrv_set_dev_ops(bs, &ide_cd_block_ops, s);
-        bdrv_set_buffer_alignment(bs, 2048);
+        bdrv_set_guest_block_size(bs, 2048);
     } else {
         if (!bdrv_is_inserted(s->bs)) {
             error_report("Device needs media, but drive is empty");
diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c
index efadfc0..6cf6040 100644
--- a/hw/scsi/scsi-disk.c
+++ b/hw/scsi/scsi-disk.c
@@ -2225,7 +2225,7 @@ static int scsi_initfn(SCSIDevice *dev)
     } else {
         bdrv_set_dev_ops(s->qdev.conf.bs, &scsi_disk_block_ops, s);
     }
-    bdrv_set_buffer_alignment(s->qdev.conf.bs, s->qdev.blocksize);
+    bdrv_set_guest_block_size(s->qdev.conf.bs, s->qdev.blocksize);
 
     bdrv_iostatus_enable(s->qdev.conf.bs);
     add_boot_device_path(s->qdev.conf.bootindex, &dev->qdev, NULL);
diff --git a/hw/scsi/scsi-generic.c b/hw/scsi/scsi-generic.c
index 8f195be..f08b64e 100644
--- a/hw/scsi/scsi-generic.c
+++ b/hw/scsi/scsi-generic.c
@@ -210,7 +210,7 @@ static void scsi_read_complete(void * opaque, int ret)
             s->blocksize = ldl_be_p(&r->buf[8]);
             s->max_lba = ldq_be_p(&r->buf[0]);
         }
-        bdrv_set_buffer_alignment(s->conf.bs, s->blocksize);
+        bdrv_set_guest_block_size(s->conf.bs, s->blocksize);
 
         scsi_req_data(&r->req, len);
         if (!r->req.io_canceled) {
diff --git a/include/block/block.h b/include/block/block.h
index cf63ee2..05252d5 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -422,7 +422,7 @@ void bdrv_img_create(const char *filename, const char *fmt,
 /* Returns the alignment in bytes that is required so that no bounce buffer
  * is required throughout the stack */
 size_t bdrv_opt_mem_align(BlockDriverState *bs);
-void bdrv_set_buffer_alignment(BlockDriverState *bs, int align);
+void bdrv_set_guest_block_size(BlockDriverState *bs, int align);
 void *qemu_blockalign(BlockDriverState *bs, size_t size);
 bool bdrv_qiov_is_aligned(BlockDriverState *bs, QEMUIOVector *qiov);
 
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 6ffae7c..9921cd3 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -319,8 +319,8 @@ struct BlockDriverState {
     /* Whether produces zeros when read beyond eof */
     bool zero_beyond_eof;
 
-    /* the memory alignment required for the buffers handled by this driver */
-    int buffer_alignment;
+    /* the block size for which the guest device expects atomicity */
+    int guest_block_size;
 
     /* do we need to tell the quest if we have a volatile write cache? */
     int enable_write_cache;
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 08/22] raw: Probe required direct I/O alignment
  2013-12-11 21:08 [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation Kevin Wolf
                   ` (6 preceding siblings ...)
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 07/22] block: rename buffer_alignment to guest_block_size Kevin Wolf
@ 2013-12-11 21:08 ` Kevin Wolf
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 09/22] block: Introduce bdrv_aligned_preadv() Kevin Wolf
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Kevin Wolf @ 2013-12-11 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, pl, stefanha

From: Paolo Bonzini <pbonzini@redhat.com>

Add a bs->request_alignment field that contains the required
offset/length alignment for I/O requests and fill it in the raw block
drivers. Use ioctls if possible, else see what alignment it takes for
O_DIRECT to succeed.

While at it, also expose the memory alignment requirements, which may be
(and in practice are) different from the disk alignment requirements.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c                   |   3 ++
 block/raw-posix.c         | 102 ++++++++++++++++++++++++++++++++++++++--------
 block/raw-win32.c         |  41 +++++++++++++++++++
 include/block/block_int.h |   3 ++
 4 files changed, 132 insertions(+), 17 deletions(-)

diff --git a/block.c b/block.c
index 3504d17..b86e754 100644
--- a/block.c
+++ b/block.c
@@ -813,6 +813,7 @@ static int bdrv_open_common(BlockDriverState *bs, BlockDriverState *file,
 
     bs->open_flags = flags;
     bs->guest_block_size = 512;
+    bs->request_alignment = 512;
     bs->zero_beyond_eof = true;
     open_flags = bdrv_open_flags(bs, flags);
     bs->read_only = !(open_flags & BDRV_O_RDWR);
@@ -881,6 +882,8 @@ static int bdrv_open_common(BlockDriverState *bs, BlockDriverState *file,
     }
 
     bdrv_refresh_limits(bs);
+    assert(bdrv_opt_mem_align(bs) != 0);
+    assert(bs->request_alignment != 0);
 
 #ifndef _WIN32
     if (bs->is_temporary) {
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 10c6b34..e8e75a7 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -127,6 +127,8 @@ typedef struct BDRVRawState {
     int fd;
     int type;
     int open_flags;
+    size_t buf_align;
+
 #if defined(__linux__)
     /* linux floppy specific */
     int64_t fd_open_time;
@@ -213,6 +215,76 @@ static int raw_normalize_devicepath(const char **filename)
 }
 #endif
 
+static void raw_probe_alignment(BlockDriverState *bs)
+{
+    BDRVRawState *s = bs->opaque;
+    char *buf;
+    unsigned int sector_size;
+
+    /* For /dev/sg devices the alignment is not really used.
+       With buffered I/O, we don't have any restrictions. */
+    if (bs->sg || !(s->open_flags & O_DIRECT)) {
+        bs->request_alignment = 1;
+        s->buf_align = 1;
+        return;
+    }
+
+    /* Try a few ioctls to get the right size */
+    bs->request_alignment = 0;
+    s->buf_align = 0;
+
+#ifdef BLKSSZGET
+    if (ioctl(s->fd, BLKSSZGET, &sector_size) >= 0) {
+        bs->request_alignment = sector_size;
+    }
+#endif
+#ifdef DKIOCGETBLOCKSIZE
+    if (ioctl(s->fd, DKIOCGETBLOCKSIZE, &sector_size) >= 0) {
+        bs->request_alignment = sector_size;
+    }
+#endif
+#ifdef DIOCGSECTORSIZE
+    if (ioctl(s->fd, DIOCGSECTORSIZE, &sector_size) >= 0) {
+        bs->request_alignment = sector_size;
+    }
+#endif
+#ifdef CONFIG_XFS
+    if (s->is_xfs) {
+        struct dioattr da;
+        if (xfsctl(NULL, s->fd, XFS_IOC_DIOINFO, &da) >= 0) {
+            bs->request_alignment = da.d_miniosz;
+            /* The kernel returns wrong information for d_mem */
+            /* s->buf_align = da.d_mem; */
+        }
+    }
+#endif
+
+    /* If we could not get the sizes so far, we can only guess them */
+    if (!s->buf_align) {
+        size_t align;
+        buf = qemu_memalign(MAX_BLOCKSIZE, 2 * MAX_BLOCKSIZE);
+        for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
+            if (pread(s->fd, buf + align, MAX_BLOCKSIZE, 0) >= 0) {
+                s->buf_align = align;
+                break;
+            }
+        }
+        qemu_vfree(buf);
+    }
+
+    if (!bs->request_alignment) {
+        size_t align;
+        buf = qemu_memalign(s->buf_align, MAX_BLOCKSIZE);
+        for (align = 512; align <= MAX_BLOCKSIZE; align <<= 1) {
+            if (pread(s->fd, buf, align, 0) >= 0) {
+                bs->request_alignment = align;
+                break;
+            }
+        }
+        qemu_vfree(buf);
+    }
+}
+
 static void raw_parse_flags(int bdrv_flags, int *open_flags)
 {
     assert(open_flags != NULL);
@@ -463,7 +535,6 @@ static int raw_reopen_prepare(BDRVReopenState *state,
     return ret;
 }
 
-
 static void raw_reopen_commit(BDRVReopenState *state)
 {
     BDRVRawReopenState *raw_s = state->opaque;
@@ -499,23 +570,15 @@ static void raw_reopen_abort(BDRVReopenState *state)
     state->opaque = NULL;
 }
 
+static int raw_refresh_limits(BlockDriverState *bs)
+{
+    BDRVRawState *s = bs->opaque;
 
-/* XXX: use host sector size if necessary with:
-#ifdef DIOCGSECTORSIZE
-        {
-            unsigned int sectorsize = 512;
-            if (!ioctl(fd, DIOCGSECTORSIZE, &sectorsize) &&
-                sectorsize > bufsize)
-                bufsize = sectorsize;
-        }
-#endif
-#ifdef CONFIG_COCOA
-        uint32_t blockSize = 512;
-        if ( !ioctl( fd, DKIOCGETBLOCKSIZE, &blockSize ) && blockSize > bufsize) {
-            bufsize = blockSize;
-        }
-#endif
-*/
+    raw_probe_alignment(bs);
+    bs->bl.opt_mem_alignment = s->buf_align;
+
+    return 0;
+}
 
 static ssize_t handle_aiocb_ioctl(RawPosixAIOData *aiocb)
 {
@@ -1363,6 +1426,7 @@ static BlockDriver bdrv_file = {
     .bdrv_aio_writev = raw_aio_writev,
     .bdrv_aio_flush = raw_aio_flush,
     .bdrv_aio_discard = raw_aio_discard,
+    .bdrv_refresh_limits = raw_refresh_limits,
 
     .bdrv_truncate = raw_truncate,
     .bdrv_getlength = raw_getlength,
@@ -1740,6 +1804,7 @@ static BlockDriver bdrv_host_device = {
     .bdrv_aio_writev	= raw_aio_writev,
     .bdrv_aio_flush	= raw_aio_flush,
     .bdrv_aio_discard   = hdev_aio_discard,
+    .bdrv_refresh_limits = raw_refresh_limits,
 
     .bdrv_truncate      = raw_truncate,
     .bdrv_getlength	= raw_getlength,
@@ -1871,6 +1936,7 @@ static BlockDriver bdrv_host_floppy = {
     .bdrv_aio_readv     = raw_aio_readv,
     .bdrv_aio_writev    = raw_aio_writev,
     .bdrv_aio_flush	= raw_aio_flush,
+    .bdrv_refresh_limits = raw_refresh_limits,
 
     .bdrv_truncate      = raw_truncate,
     .bdrv_getlength      = raw_getlength,
@@ -1981,6 +2047,7 @@ static BlockDriver bdrv_host_cdrom = {
     .bdrv_aio_readv     = raw_aio_readv,
     .bdrv_aio_writev    = raw_aio_writev,
     .bdrv_aio_flush	= raw_aio_flush,
+    .bdrv_refresh_limits = raw_refresh_limits,
 
     .bdrv_truncate      = raw_truncate,
     .bdrv_getlength      = raw_getlength,
@@ -2110,6 +2177,7 @@ static BlockDriver bdrv_host_cdrom = {
     .bdrv_aio_readv     = raw_aio_readv,
     .bdrv_aio_writev    = raw_aio_writev,
     .bdrv_aio_flush	= raw_aio_flush,
+    .bdrv_refresh_limits = raw_refresh_limits,
 
     .bdrv_truncate      = raw_truncate,
     .bdrv_getlength      = raw_getlength,
diff --git a/block/raw-win32.c b/block/raw-win32.c
index 2bad5a3..6fe64d2 100644
--- a/block/raw-win32.c
+++ b/block/raw-win32.c
@@ -202,6 +202,35 @@ static int set_sparse(int fd)
 				 NULL, 0, NULL, 0, &returned, NULL);
 }
 
+static void raw_probe_alignment(BlockDriverState *bs)
+{
+    BDRVRawState *s = bs->opaque;
+    DWORD sectorsPerCluster, freeClusters, totalClusters, count;
+    DISK_GEOMETRY_EX dg;
+    BOOL status;
+
+    if (s->type == FTYPE_CD) {
+        bs->request_alignment = 2048;
+        return;
+    }
+    if (s->type == FTYPE_HARDDISK) {
+        status = DeviceIoControl(s->hfile, IOCTL_DISK_GET_DRIVE_GEOMETRY_EX,
+                                 NULL, 0, &dg, sizeof(dg), &count, NULL);
+        if (status != 0) {
+            bs->request_alignment = dg.Geometry.BytesPerSector;
+            return;
+        }
+        /* try GetDiskFreeSpace too */
+    }
+
+    if (s->drive_path[0]) {
+        GetDiskFreeSpace(s->drive_path, &sectorsPerCluster,
+                         &dg.Geometry.BytesPerSector,
+                         &freeClusters, &totalClusters);
+        bs->request_alignment = dg.Geometry.BytesPerSector;
+    }
+}
+
 static void raw_parse_flags(int flags, int *access_flags, DWORD *overlapped)
 {
     assert(access_flags != NULL);
@@ -269,6 +298,17 @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags,
         }
     }
 
+    if (filename[0] && filename[1] == ':') {
+        snprintf(s->drive_path, sizeof(s->drive_path), "%c:\\", filename[0]);
+    } else if (filename[0] == '\\' && filename[1] == '\\') {
+        s->drive_path[0] = 0;
+    } else {
+        /* Relative path.  */
+        char buf[MAX_PATH];
+        GetCurrentDirectory(MAX_PATH, buf);
+        snprintf(s->drive_path, sizeof(s->drive_path), "%c:\\", buf[0]);
+    }
+
     s->hfile = CreateFile(filename, access_flags,
                           FILE_SHARE_READ, NULL,
                           OPEN_EXISTING, overlapped, NULL);
@@ -293,6 +333,7 @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags,
         s->aio = aio;
     }
 
+    raw_probe_alignment(bs);
     ret = 0;
 fail:
     qemu_opts_del(opts);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 9921cd3..0a01b69 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -319,6 +319,9 @@ struct BlockDriverState {
     /* Whether produces zeros when read beyond eof */
     bool zero_beyond_eof;
 
+    /* Alignment requirement for offset/length of I/O requests */
+    unsigned int request_alignment;
+
     /* the block size for which the guest device expects atomicity */
     int guest_block_size;
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 09/22] block: Introduce bdrv_aligned_preadv()
  2013-12-11 21:08 [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation Kevin Wolf
                   ` (7 preceding siblings ...)
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 08/22] raw: Probe required direct I/O alignment Kevin Wolf
@ 2013-12-11 21:08 ` Kevin Wolf
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 10/22] block: Introduce bdrv_co_do_preadv() Kevin Wolf
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Kevin Wolf @ 2013-12-11 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, pl, stefanha

This separates the part of bdrv_co_do_readv() that needs to happen
before the request is modified to match the backend alignment, and a
part that needs to be executed afterwards and passes the request to the
BlockDriver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 61 +++++++++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 43 insertions(+), 18 deletions(-)

diff --git a/block.c b/block.c
index b86e754..cbc9f6d 100644
--- a/block.c
+++ b/block.c
@@ -2700,26 +2700,24 @@ err:
 }
 
 /*
- * Handle a read request in coroutine context
+ * Forwards an already correctly aligned request to the BlockDriver. This
+ * handles copy on read and zeroing after EOF; any other features must be
+ * implemented by the caller.
  */
-static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
-    int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
-    BdrvRequestFlags flags)
+static int coroutine_fn bdrv_aligned_preadv(BlockDriverState *bs,
+    int64_t offset, unsigned int bytes, QEMUIOVector *qiov, int flags)
 {
     BlockDriver *drv = bs->drv;
     BdrvTrackedRequest req;
     int ret;
 
-    if (!drv) {
-        return -ENOMEDIUM;
-    }
-    if (bdrv_check_request(bs, sector_num, nb_sectors)) {
-        return -EIO;
-    }
+    int64_t sector_num = offset >> BDRV_SECTOR_BITS;
+    unsigned int nb_sectors = bytes >> BDRV_SECTOR_BITS;
 
-    if (bs->copy_on_read) {
-        flags |= BDRV_REQ_COPY_ON_READ;
-    }
+    assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
+    assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
+
+    /* Handle Copy on Read and associated serialisation */
     if (flags & BDRV_REQ_COPY_ON_READ) {
         bs->copy_on_read_in_flight++;
     }
@@ -2728,11 +2726,6 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
         wait_for_overlapping_requests(bs, sector_num, nb_sectors);
     }
 
-    /* throttling disk I/O */
-    if (bs->io_limits_enabled) {
-        bdrv_io_limits_intercept(bs, nb_sectors, false);
-    }
-
     tracked_request_begin(&req, bs, sector_num, nb_sectors, false);
 
     if (flags & BDRV_REQ_COPY_ON_READ) {
@@ -2749,6 +2742,7 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
         }
     }
 
+    /* Forward the request to the BlockDriver */
     if (!(bs->zero_beyond_eof && bs->growable)) {
         ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
     } else {
@@ -2789,6 +2783,37 @@ out:
     return ret;
 }
 
+/*
+ * Handle a read request in coroutine context
+ */
+static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
+    int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
+    BdrvRequestFlags flags)
+{
+    BlockDriver *drv = bs->drv;
+    int ret;
+
+    if (!drv) {
+        return -ENOMEDIUM;
+    }
+    if (bdrv_check_request(bs, sector_num, nb_sectors)) {
+        return -EIO;
+    }
+
+    if (bs->copy_on_read) {
+        flags |= BDRV_REQ_COPY_ON_READ;
+    }
+
+    /* throttling disk I/O */
+    if (bs->io_limits_enabled) {
+        bdrv_io_limits_intercept(bs, nb_sectors, false);
+    }
+
+    ret = bdrv_aligned_preadv(bs, sector_num << BDRV_SECTOR_BITS,
+                             nb_sectors << BDRV_SECTOR_BITS, qiov, flags);
+    return ret;
+}
+
 int coroutine_fn bdrv_co_readv(BlockDriverState *bs, int64_t sector_num,
     int nb_sectors, QEMUIOVector *qiov)
 {
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 10/22] block: Introduce bdrv_co_do_preadv()
  2013-12-11 21:08 [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation Kevin Wolf
                   ` (8 preceding siblings ...)
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 09/22] block: Introduce bdrv_aligned_preadv() Kevin Wolf
@ 2013-12-11 21:08 ` Kevin Wolf
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 11/22] block: Introduce bdrv_aligned_pwritev() Kevin Wolf
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Kevin Wolf @ 2013-12-11 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, pl, stefanha

Similar to bdrv_pread(), which aligns byte-aligned request to 512 byte
sectors, bdrv_co_do_preadv() takes a byte-aligned request and aligns it
to the alignment specified in bs->request_alignment.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 57 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index cbc9f6d..7812d8f 100644
--- a/block.c
+++ b/block.c
@@ -2786,17 +2786,23 @@ out:
 /*
  * Handle a read request in coroutine context
  */
-static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
-    int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
+static int coroutine_fn bdrv_co_do_preadv(BlockDriverState *bs,
+    int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
     BdrvRequestFlags flags)
 {
     BlockDriver *drv = bs->drv;
+    /* TODO Lift BDRV_SECTOR_SIZE restriction in BlockDriver interface */
+    uint64_t align = MAX(BDRV_SECTOR_SIZE, bs->request_alignment);
+    uint8_t *head_buf = NULL;
+    uint8_t *tail_buf = NULL;
+    QEMUIOVector local_qiov;
+    bool use_local_qiov = false;
     int ret;
 
     if (!drv) {
         return -ENOMEDIUM;
     }
-    if (bdrv_check_request(bs, sector_num, nb_sectors)) {
+    if (bdrv_check_byte_request(bs, offset, bytes)) {
         return -EIO;
     }
 
@@ -2806,14 +2812,59 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
 
     /* throttling disk I/O */
     if (bs->io_limits_enabled) {
-        bdrv_io_limits_intercept(bs, nb_sectors, false);
+        bdrv_io_limits_intercept(bs, bytes >> BDRV_SECTOR_BITS, false);
+    }
+
+    /* Align read if necessary by padding qiov */
+    if (offset & (align - 1)) {
+        head_buf = qemu_blockalign(bs, align);
+        qemu_iovec_init(&local_qiov, qiov->niov + 2);
+        qemu_iovec_add(&local_qiov, head_buf, offset & (align - 1));
+        qemu_iovec_concat(&local_qiov, qiov, 0, qiov->size);
+        use_local_qiov = true;
+
+        bytes += offset & (align - 1);
+        offset = offset & ~(align - 1);
+    }
+
+    if ((offset + bytes) & (align - 1)) {
+        if (!use_local_qiov) {
+            qemu_iovec_init(&local_qiov, qiov->niov + 1);
+            qemu_iovec_concat(&local_qiov, qiov, 0, qiov->size);
+            use_local_qiov = true;
+        }
+        tail_buf = qemu_blockalign(bs, align);
+        qemu_iovec_add(&local_qiov, tail_buf,
+                       align - ((offset + bytes) & (align - 1)));
+
+        bytes = ROUND_UP(bytes, align);
+    }
+
+    ret = bdrv_aligned_preadv(bs, offset, bytes,
+                              use_local_qiov ? &local_qiov : qiov,
+                              flags);
+
+    if (use_local_qiov) {
+        qemu_iovec_destroy(&local_qiov);
+        qemu_vfree(head_buf);
+        qemu_vfree(tail_buf);
     }
 
-    ret = bdrv_aligned_preadv(bs, sector_num << BDRV_SECTOR_BITS,
-                             nb_sectors << BDRV_SECTOR_BITS, qiov, flags);
     return ret;
 }
 
+static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
+    int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
+    BdrvRequestFlags flags)
+{
+    if (nb_sectors < 0 || nb_sectors > (UINT_MAX >> BDRV_SECTOR_BITS)) {
+        return -EINVAL;
+    }
+
+    return bdrv_co_do_preadv(bs, sector_num << BDRV_SECTOR_BITS,
+                             nb_sectors << BDRV_SECTOR_BITS, qiov, flags);
+}
+
 int coroutine_fn bdrv_co_readv(BlockDriverState *bs, int64_t sector_num,
     int nb_sectors, QEMUIOVector *qiov)
 {
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 11/22] block: Introduce bdrv_aligned_pwritev()
  2013-12-11 21:08 [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation Kevin Wolf
                   ` (9 preceding siblings ...)
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 10/22] block: Introduce bdrv_co_do_preadv() Kevin Wolf
@ 2013-12-11 21:08 ` Kevin Wolf
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 12/22] block: write: Handle COR dependency after I/O throttling Kevin Wolf
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Kevin Wolf @ 2013-12-11 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, pl, stefanha

This separates the part of bdrv_co_do_writev() that needs to happen
before the request is modified to match the backend alignment, and a
part that needs to be executed afterwards and passes the request to the
BlockDriver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 62 +++++++++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 41 insertions(+), 21 deletions(-)

diff --git a/block.c b/block.c
index 7812d8f..e26e31c 100644
--- a/block.c
+++ b/block.c
@@ -2958,34 +2958,20 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
 }
 
 /*
- * Handle a write request in coroutine context
+ * Forwards an already correctly aligned write request to the BlockDriver.
  */
-static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
-    int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
-    BdrvRequestFlags flags)
+static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
+    int64_t offset, unsigned int bytes, QEMUIOVector *qiov, int flags)
 {
     BlockDriver *drv = bs->drv;
     BdrvTrackedRequest req;
     int ret;
 
-    if (!bs->drv) {
-        return -ENOMEDIUM;
-    }
-    if (bs->read_only) {
-        return -EACCES;
-    }
-    if (bdrv_check_request(bs, sector_num, nb_sectors)) {
-        return -EIO;
-    }
-
-    if (bs->copy_on_read_in_flight) {
-        wait_for_overlapping_requests(bs, sector_num, nb_sectors);
-    }
+    int64_t sector_num = offset >> BDRV_SECTOR_BITS;
+    unsigned int nb_sectors = bytes >> BDRV_SECTOR_BITS;
 
-    /* throttling disk I/O */
-    if (bs->io_limits_enabled) {
-        bdrv_io_limits_intercept(bs, nb_sectors, true);
-    }
+    assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
+    assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
 
     tracked_request_begin(&req, bs, sector_num, nb_sectors, true);
 
@@ -3017,6 +3003,40 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
     return ret;
 }
 
+/*
+ * Handle a write request in coroutine context
+ */
+static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
+    int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
+    BdrvRequestFlags flags)
+{
+    int ret;
+
+    if (!bs->drv) {
+        return -ENOMEDIUM;
+    }
+    if (bs->read_only) {
+        return -EACCES;
+    }
+    if (bdrv_check_request(bs, sector_num, nb_sectors)) {
+        return -EIO;
+    }
+
+    if (bs->copy_on_read_in_flight) {
+        wait_for_overlapping_requests(bs, sector_num, nb_sectors);
+    }
+
+    /* throttling disk I/O */
+    if (bs->io_limits_enabled) {
+        bdrv_io_limits_intercept(bs, nb_sectors, true);
+    }
+
+    ret = bdrv_aligned_pwritev(bs, sector_num << BDRV_SECTOR_BITS,
+                               nb_sectors << BDRV_SECTOR_BITS, qiov, flags);
+
+    return ret;
+}
+
 int coroutine_fn bdrv_co_writev(BlockDriverState *bs, int64_t sector_num,
     int nb_sectors, QEMUIOVector *qiov)
 {
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 12/22] block: write: Handle COR dependency after I/O throttling
  2013-12-11 21:08 [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation Kevin Wolf
                   ` (10 preceding siblings ...)
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 11/22] block: Introduce bdrv_aligned_pwritev() Kevin Wolf
@ 2013-12-11 21:08 ` Kevin Wolf
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 13/22] block: Introduce bdrv_co_do_pwritev() Kevin Wolf
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Kevin Wolf @ 2013-12-11 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, pl, stefanha

First waiting for all COR requests to complete and calling the
throttling function afterwards means that the request could be delayed
and we still need to wait for the COR request even if it was issued only
after the throttled write request.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/block.c b/block.c
index e26e31c..385fb8a 100644
--- a/block.c
+++ b/block.c
@@ -2973,6 +2973,10 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
     assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
     assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
 
+    if (bs->copy_on_read_in_flight) {
+        wait_for_overlapping_requests(bs, sector_num, nb_sectors);
+    }
+
     tracked_request_begin(&req, bs, sector_num, nb_sectors, true);
 
     ret = notifier_with_return_list_notify(&bs->before_write_notifiers, &req);
@@ -3022,10 +3026,6 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
         return -EIO;
     }
 
-    if (bs->copy_on_read_in_flight) {
-        wait_for_overlapping_requests(bs, sector_num, nb_sectors);
-    }
-
     /* throttling disk I/O */
     if (bs->io_limits_enabled) {
         bdrv_io_limits_intercept(bs, nb_sectors, true);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 13/22] block: Introduce bdrv_co_do_pwritev()
  2013-12-11 21:08 [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation Kevin Wolf
                   ` (11 preceding siblings ...)
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 12/22] block: write: Handle COR dependency after I/O throttling Kevin Wolf
@ 2013-12-11 21:08 ` Kevin Wolf
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 14/22] block: Switch BdrvTrackedRequest to byte granularity Kevin Wolf
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Kevin Wolf @ 2013-12-11 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, pl, stefanha

This is going to become the bdrv_co_do_preadv() equivalent for writes.
In this patch, however, just a function taking byte offsets is created,
it doesn't align anything yet.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 23 +++++++++++++++++------
 1 file changed, 17 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index 385fb8a..a80db2e 100644
--- a/block.c
+++ b/block.c
@@ -3010,8 +3010,8 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
 /*
  * Handle a write request in coroutine context
  */
-static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
-    int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
+static int coroutine_fn bdrv_co_do_pwritev(BlockDriverState *bs,
+    int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
     BdrvRequestFlags flags)
 {
     int ret;
@@ -3022,21 +3022,32 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
     if (bs->read_only) {
         return -EACCES;
     }
-    if (bdrv_check_request(bs, sector_num, nb_sectors)) {
+    if (bdrv_check_byte_request(bs, offset, bytes)) {
         return -EIO;
     }
 
     /* throttling disk I/O */
     if (bs->io_limits_enabled) {
-        bdrv_io_limits_intercept(bs, nb_sectors, true);
+        bdrv_io_limits_intercept(bs, bytes << BDRV_SECTOR_BITS, true);
     }
 
-    ret = bdrv_aligned_pwritev(bs, sector_num << BDRV_SECTOR_BITS,
-                               nb_sectors << BDRV_SECTOR_BITS, qiov, flags);
+    ret = bdrv_aligned_pwritev(bs, offset, bytes, qiov, flags);
 
     return ret;
 }
 
+static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
+    int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
+    BdrvRequestFlags flags)
+{
+    if (nb_sectors < 0 || nb_sectors > (UINT_MAX >> BDRV_SECTOR_BITS)) {
+        return -EINVAL;
+    }
+
+    return bdrv_co_do_pwritev(bs, sector_num << BDRV_SECTOR_BITS,
+                              nb_sectors << BDRV_SECTOR_BITS, qiov, flags);
+}
+
 int coroutine_fn bdrv_co_writev(BlockDriverState *bs, int64_t sector_num,
     int nb_sectors, QEMUIOVector *qiov)
 {
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 14/22] block: Switch BdrvTrackedRequest to byte granularity
  2013-12-11 21:08 [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation Kevin Wolf
                   ` (12 preceding siblings ...)
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 13/22] block: Introduce bdrv_co_do_pwritev() Kevin Wolf
@ 2013-12-11 21:08 ` Kevin Wolf
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 15/22] block: Allow waiting for overlapping requests between begin/end Kevin Wolf
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Kevin Wolf @ 2013-12-11 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, pl, stefanha

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c                   | 52 +++++++++++++++++++++++++++++++----------------
 block/backup.c            |  7 ++++++-
 include/block/block_int.h |  4 ++--
 3 files changed, 42 insertions(+), 21 deletions(-)

diff --git a/block.c b/block.c
index a80db2e..fa888d9 100644
--- a/block.c
+++ b/block.c
@@ -2037,13 +2037,13 @@ static void tracked_request_end(BdrvTrackedRequest *req)
  */
 static void tracked_request_begin(BdrvTrackedRequest *req,
                                   BlockDriverState *bs,
-                                  int64_t sector_num,
-                                  int nb_sectors, bool is_write)
+                                  int64_t offset,
+                                  unsigned int bytes, bool is_write)
 {
     *req = (BdrvTrackedRequest){
         .bs = bs,
-        .sector_num = sector_num,
-        .nb_sectors = nb_sectors,
+        .offset = offset,
+        .bytes = bytes,
         .is_write = is_write,
         .co = qemu_coroutine_self(),
     };
@@ -2074,25 +2074,43 @@ void bdrv_round_to_clusters(BlockDriverState *bs,
     }
 }
 
+static void round_bytes_to_clusters(BlockDriverState *bs,
+                                    int64_t offset, unsigned int bytes,
+                                    int64_t *cluster_offset,
+                                    unsigned int *cluster_bytes)
+{
+    BlockDriverInfo bdi;
+
+    if (bdrv_get_info(bs, &bdi) < 0 || bdi.cluster_size == 0) {
+        *cluster_offset = offset;
+        *cluster_bytes = bytes;
+    } else {
+        *cluster_offset = QEMU_ALIGN_DOWN(offset, bdi.cluster_size);
+        *cluster_bytes = QEMU_ALIGN_UP(offset - *cluster_offset + bytes,
+                                       bdi.cluster_size);
+    }
+}
+
 static bool tracked_request_overlaps(BdrvTrackedRequest *req,
-                                     int64_t sector_num, int nb_sectors) {
+                                     int64_t offset, int bytes)
+{
     /*        aaaa   bbbb */
-    if (sector_num >= req->sector_num + req->nb_sectors) {
+    if (offset >= req->offset + req->bytes) {
         return false;
     }
     /* bbbb   aaaa        */
-    if (req->sector_num >= sector_num + nb_sectors) {
+    if (req->offset >= offset + bytes) {
         return false;
     }
     return true;
 }
 
 static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
-        int64_t sector_num, int nb_sectors)
+        int64_t offset, unsigned int bytes)
 {
     BdrvTrackedRequest *req;
-    int64_t cluster_sector_num;
-    int cluster_nb_sectors;
+    int64_t cluster_offset;
+    unsigned int cluster_bytes;
     bool retry;
 
     /* If we touch the same cluster it counts as an overlap.  This guarantees
@@ -2101,14 +2119,12 @@ static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
      * CoR read and write operations are atomic and guest writes cannot
      * interleave between them.
      */
-    bdrv_round_to_clusters(bs, sector_num, nb_sectors,
-                           &cluster_sector_num, &cluster_nb_sectors);
+    round_bytes_to_clusters(bs, offset, bytes, &cluster_offset, &cluster_bytes);
 
     do {
         retry = false;
         QLIST_FOREACH(req, &bs->tracked_requests, list) {
-            if (tracked_request_overlaps(req, cluster_sector_num,
-                                         cluster_nb_sectors)) {
+            if (tracked_request_overlaps(req, cluster_offset, cluster_bytes)) {
                 /* Hitting this means there was a reentrant request, for
                  * example, a block driver issuing nested requests.  This must
                  * never happen since it means deadlock.
@@ -2723,10 +2739,10 @@ static int coroutine_fn bdrv_aligned_preadv(BlockDriverState *bs,
     }
 
     if (bs->copy_on_read_in_flight) {
-        wait_for_overlapping_requests(bs, sector_num, nb_sectors);
+        wait_for_overlapping_requests(bs, offset, bytes);
     }
 
-    tracked_request_begin(&req, bs, sector_num, nb_sectors, false);
+    tracked_request_begin(&req, bs, offset, bytes, false);
 
     if (flags & BDRV_REQ_COPY_ON_READ) {
         int pnum;
@@ -2974,10 +2990,10 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
     assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
 
     if (bs->copy_on_read_in_flight) {
-        wait_for_overlapping_requests(bs, sector_num, nb_sectors);
+        wait_for_overlapping_requests(bs, offset, bytes);
     }
 
-    tracked_request_begin(&req, bs, sector_num, nb_sectors, true);
+    tracked_request_begin(&req, bs, offset, bytes, true);
 
     ret = notifier_with_return_list_notify(&bs->before_write_notifiers, &req);
 
diff --git a/block/backup.c b/block/backup.c
index 0198514..15a2e55 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -181,8 +181,13 @@ static int coroutine_fn backup_before_write_notify(
         void *opaque)
 {
     BdrvTrackedRequest *req = opaque;
+    int64_t sector_num = req->offset >> BDRV_SECTOR_BITS;
+    int nb_sectors = req->bytes >> BDRV_SECTOR_BITS;
 
-    return backup_do_cow(req->bs, req->sector_num, req->nb_sectors, NULL);
+    assert((req->offset & (BDRV_SECTOR_SIZE - 1)) == 0);
+    assert((req->bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
+
+    return backup_do_cow(req->bs, sector_num, nb_sectors, NULL);
 }
 
 static void backup_set_speed(BlockJob *job, int64_t speed, Error **errp)
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 0a01b69..a11e5c9 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -57,8 +57,8 @@
 
 typedef struct BdrvTrackedRequest {
     BlockDriverState *bs;
-    int64_t sector_num;
-    int nb_sectors;
+    int64_t offset;
+    unsigned int bytes;
     bool is_write;
     QLIST_ENTRY(BdrvTrackedRequest) list;
     Coroutine *co; /* owner, used for deadlock detection */
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 15/22] block: Allow waiting for overlapping requests between begin/end
  2013-12-11 21:08 [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation Kevin Wolf
                   ` (13 preceding siblings ...)
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 14/22] block: Switch BdrvTrackedRequest to byte granularity Kevin Wolf
@ 2013-12-11 21:08 ` Kevin Wolf
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 16/22] block: Make zero-after-EOF work with larger alignment Kevin Wolf
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Kevin Wolf @ 2013-12-11 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, pl, stefanha

Previously, it was not possible to use wait_for_overlapping_requests()
between tracked_request_begin()/end() because it would wait for itself.

Ignore the current request in the overlap check and run more of the
bdrv_co_do_preadv/pwritev code with a BdrvTrackedRequest present.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 38 ++++++++++++++++++++------------------
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/block.c b/block.c
index fa888d9..b4f6ead 100644
--- a/block.c
+++ b/block.c
@@ -2106,7 +2106,7 @@ static bool tracked_request_overlaps(BdrvTrackedRequest *req,
 }
 
 static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
-        int64_t offset, unsigned int bytes)
+        BdrvTrackedRequest *self, int64_t offset, unsigned int bytes)
 {
     BdrvTrackedRequest *req;
     int64_t cluster_offset;
@@ -2124,6 +2124,9 @@ static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
     do {
         retry = false;
         QLIST_FOREACH(req, &bs->tracked_requests, list) {
+            if (req == self) {
+                continue;
+            }
             if (tracked_request_overlaps(req, cluster_offset, cluster_bytes)) {
                 /* Hitting this means there was a reentrant request, for
                  * example, a block driver issuing nested requests.  This must
@@ -2721,10 +2724,10 @@ err:
  * implemented by the caller.
  */
 static int coroutine_fn bdrv_aligned_preadv(BlockDriverState *bs,
-    int64_t offset, unsigned int bytes, QEMUIOVector *qiov, int flags)
+    BdrvTrackedRequest *req, int64_t offset, unsigned int bytes,
+    QEMUIOVector *qiov, int flags)
 {
     BlockDriver *drv = bs->drv;
-    BdrvTrackedRequest req;
     int ret;
 
     int64_t sector_num = offset >> BDRV_SECTOR_BITS;
@@ -2739,11 +2742,9 @@ static int coroutine_fn bdrv_aligned_preadv(BlockDriverState *bs,
     }
 
     if (bs->copy_on_read_in_flight) {
-        wait_for_overlapping_requests(bs, offset, bytes);
+        wait_for_overlapping_requests(bs, req, offset, bytes);
     }
 
-    tracked_request_begin(&req, bs, offset, bytes, false);
-
     if (flags & BDRV_REQ_COPY_ON_READ) {
         int pnum;
 
@@ -2790,8 +2791,6 @@ static int coroutine_fn bdrv_aligned_preadv(BlockDriverState *bs,
     }
 
 out:
-    tracked_request_end(&req);
-
     if (flags & BDRV_REQ_COPY_ON_READ) {
         bs->copy_on_read_in_flight--;
     }
@@ -2807,6 +2806,8 @@ static int coroutine_fn bdrv_co_do_preadv(BlockDriverState *bs,
     BdrvRequestFlags flags)
 {
     BlockDriver *drv = bs->drv;
+    BdrvTrackedRequest req;
+
     /* TODO Lift BDRV_SECTOR_SIZE restriction in BlockDriver interface */
     uint64_t align = MAX(BDRV_SECTOR_SIZE, bs->request_alignment);
     uint8_t *head_buf = NULL;
@@ -2856,9 +2857,11 @@ static int coroutine_fn bdrv_co_do_preadv(BlockDriverState *bs,
         bytes = ROUND_UP(bytes, align);
     }
 
-    ret = bdrv_aligned_preadv(bs, offset, bytes,
+    tracked_request_begin(&req, bs, offset, bytes, false);
+    ret = bdrv_aligned_preadv(bs, &req, offset, bytes,
                               use_local_qiov ? &local_qiov : qiov,
                               flags);
+    tracked_request_end(&req);
 
     if (use_local_qiov) {
         qemu_iovec_destroy(&local_qiov);
@@ -2977,10 +2980,10 @@ static int coroutine_fn bdrv_co_do_write_zeroes(BlockDriverState *bs,
  * Forwards an already correctly aligned write request to the BlockDriver.
  */
 static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
-    int64_t offset, unsigned int bytes, QEMUIOVector *qiov, int flags)
+    BdrvTrackedRequest *req, int64_t offset, unsigned int bytes,
+    QEMUIOVector *qiov, int flags)
 {
     BlockDriver *drv = bs->drv;
-    BdrvTrackedRequest req;
     int ret;
 
     int64_t sector_num = offset >> BDRV_SECTOR_BITS;
@@ -2990,12 +2993,10 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
     assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
 
     if (bs->copy_on_read_in_flight) {
-        wait_for_overlapping_requests(bs, offset, bytes);
+        wait_for_overlapping_requests(bs, req, offset, bytes);
     }
 
-    tracked_request_begin(&req, bs, offset, bytes, true);
-
-    ret = notifier_with_return_list_notify(&bs->before_write_notifiers, &req);
+    ret = notifier_with_return_list_notify(&bs->before_write_notifiers, req);
 
     if (ret < 0) {
         /* Do nothing, write notifier decided to fail this request */
@@ -3018,8 +3019,6 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
         bs->total_sectors = MAX(bs->total_sectors, sector_num + nb_sectors);
     }
 
-    tracked_request_end(&req);
-
     return ret;
 }
 
@@ -3030,6 +3029,7 @@ static int coroutine_fn bdrv_co_do_pwritev(BlockDriverState *bs,
     int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
     BdrvRequestFlags flags)
 {
+    BdrvTrackedRequest req;
     int ret;
 
     if (!bs->drv) {
@@ -3047,7 +3047,9 @@ static int coroutine_fn bdrv_co_do_pwritev(BlockDriverState *bs,
         bdrv_io_limits_intercept(bs, bytes << BDRV_SECTOR_BITS, true);
     }
 
-    ret = bdrv_aligned_pwritev(bs, offset, bytes, qiov, flags);
+    tracked_request_begin(&req, bs, offset, bytes, true);
+    ret = bdrv_aligned_pwritev(bs, &req, offset, bytes, qiov, flags);
+    tracked_request_end(&req);
 
     return ret;
 }
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 16/22] block: Make zero-after-EOF work with larger alignment
  2013-12-11 21:08 [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation Kevin Wolf
                   ` (14 preceding siblings ...)
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 15/22] block: Allow waiting for overlapping requests between begin/end Kevin Wolf
@ 2013-12-11 21:08 ` Kevin Wolf
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 17/22] block: Generalise and optimise COR serialisation Kevin Wolf
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Kevin Wolf @ 2013-12-11 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, pl, stefanha

Odd file sizes could make bdrv_aligned_preadv() shorten the request in
non-aligned ways. Fix it by rounding to the required alignment instead
of 512 bytes.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index b4f6ead..6dddb7c 100644
--- a/block.c
+++ b/block.c
@@ -2725,7 +2725,7 @@ err:
  */
 static int coroutine_fn bdrv_aligned_preadv(BlockDriverState *bs,
     BdrvTrackedRequest *req, int64_t offset, unsigned int bytes,
-    QEMUIOVector *qiov, int flags)
+    int64_t align, QEMUIOVector *qiov, int flags)
 {
     BlockDriver *drv = bs->drv;
     int ret;
@@ -2773,7 +2773,7 @@ static int coroutine_fn bdrv_aligned_preadv(BlockDriverState *bs,
         }
 
         total_sectors = DIV_ROUND_UP(len, BDRV_SECTOR_SIZE);
-        max_nb_sectors = MAX(0, total_sectors - sector_num);
+        max_nb_sectors = MAX(0, ROUND_UP(total_sectors - sector_num, align));
         if (max_nb_sectors > 0) {
             ret = drv->bdrv_co_readv(bs, sector_num,
                                      MIN(nb_sectors, max_nb_sectors), qiov);
@@ -2858,7 +2858,7 @@ static int coroutine_fn bdrv_co_do_preadv(BlockDriverState *bs,
     }
 
     tracked_request_begin(&req, bs, offset, bytes, false);
-    ret = bdrv_aligned_preadv(bs, &req, offset, bytes,
+    ret = bdrv_aligned_preadv(bs, &req, offset, bytes, align,
                               use_local_qiov ? &local_qiov : qiov,
                               flags);
     tracked_request_end(&req);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 17/22] block: Generalise and optimise COR serialisation
  2013-12-11 21:08 [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation Kevin Wolf
                   ` (15 preceding siblings ...)
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 16/22] block: Make zero-after-EOF work with larger alignment Kevin Wolf
@ 2013-12-11 21:08 ` Kevin Wolf
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 18/22] block: Make overlap range for serialisation dynamic Kevin Wolf
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Kevin Wolf @ 2013-12-11 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, pl, stefanha

Change the API so that specific requests can be marked serialising. Only
these requests are checked for overlaps then.

This means that during a Copy on Read operation, not all requests
overlapping other requests are serialised any more, but only those that
actually overlap with the specific COR request.

Also remove COR from function and variable names because this
functionality can be useful in other contexts.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c                   | 40 +++++++++++++++++++++++++---------------
 include/block/block_int.h |  5 +++--
 2 files changed, 28 insertions(+), 17 deletions(-)

diff --git a/block.c b/block.c
index 6dddb7c..40be327 100644
--- a/block.c
+++ b/block.c
@@ -2028,6 +2028,10 @@ int bdrv_commit_all(void)
  */
 static void tracked_request_end(BdrvTrackedRequest *req)
 {
+    if (req->serialising) {
+        req->bs->serialising_in_flight--;
+    }
+
     QLIST_REMOVE(req, list);
     qemu_co_queue_restart_all(&req->wait_queue);
 }
@@ -2046,6 +2050,7 @@ static void tracked_request_begin(BdrvTrackedRequest *req,
         .bytes = bytes,
         .is_write = is_write,
         .co = qemu_coroutine_self(),
+        .serialising = false,
     };
 
     qemu_co_queue_init(&req->wait_queue);
@@ -2053,6 +2058,14 @@ static void tracked_request_begin(BdrvTrackedRequest *req,
     QLIST_INSERT_HEAD(&bs->tracked_requests, req, list);
 }
 
+static void mark_request_serialising(BdrvTrackedRequest *req)
+{
+    if (!req->serialising) {
+        req->bs->serialising_in_flight++;
+        req->serialising = true;
+    }
+}
+
 /**
  * Round a region to cluster boundaries
  */
@@ -2105,26 +2118,31 @@ static bool tracked_request_overlaps(BdrvTrackedRequest *req,
     return true;
 }
 
-static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
-        BdrvTrackedRequest *self, int64_t offset, unsigned int bytes)
+static void coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
 {
+    BlockDriverState *bs = self->bs;
     BdrvTrackedRequest *req;
     int64_t cluster_offset;
     unsigned int cluster_bytes;
     bool retry;
 
+    if (!bs->serialising_in_flight) {
+        return;
+    }
+
     /* If we touch the same cluster it counts as an overlap.  This guarantees
      * that allocating writes will be serialized and not race with each other
      * for the same cluster.  For example, in copy-on-read it ensures that the
      * CoR read and write operations are atomic and guest writes cannot
      * interleave between them.
      */
-    round_bytes_to_clusters(bs, offset, bytes, &cluster_offset, &cluster_bytes);
+    round_bytes_to_clusters(bs, self->offset, self->bytes,
+                            &cluster_offset, &cluster_bytes);
 
     do {
         retry = false;
         QLIST_FOREACH(req, &bs->tracked_requests, list) {
-            if (req == self) {
+            if (req == self || (!req->serialising && !self->serialising)) {
                 continue;
             }
             if (tracked_request_overlaps(req, cluster_offset, cluster_bytes)) {
@@ -2738,12 +2756,10 @@ static int coroutine_fn bdrv_aligned_preadv(BlockDriverState *bs,
 
     /* Handle Copy on Read and associated serialisation */
     if (flags & BDRV_REQ_COPY_ON_READ) {
-        bs->copy_on_read_in_flight++;
+        mark_request_serialising(req);
     }
 
-    if (bs->copy_on_read_in_flight) {
-        wait_for_overlapping_requests(bs, req, offset, bytes);
-    }
+    wait_serialising_requests(req);
 
     if (flags & BDRV_REQ_COPY_ON_READ) {
         int pnum;
@@ -2791,10 +2807,6 @@ static int coroutine_fn bdrv_aligned_preadv(BlockDriverState *bs,
     }
 
 out:
-    if (flags & BDRV_REQ_COPY_ON_READ) {
-        bs->copy_on_read_in_flight--;
-    }
-
     return ret;
 }
 
@@ -2992,9 +3004,7 @@ static int coroutine_fn bdrv_aligned_pwritev(BlockDriverState *bs,
     assert((offset & (BDRV_SECTOR_SIZE - 1)) == 0);
     assert((bytes & (BDRV_SECTOR_SIZE - 1)) == 0);
 
-    if (bs->copy_on_read_in_flight) {
-        wait_for_overlapping_requests(bs, req, offset, bytes);
-    }
+    wait_serialising_requests(req);
 
     ret = notifier_with_return_list_notify(&bs->before_write_notifiers, req);
 
diff --git a/include/block/block_int.h b/include/block/block_int.h
index a11e5c9..b00402b 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -60,6 +60,7 @@ typedef struct BdrvTrackedRequest {
     int64_t offset;
     unsigned int bytes;
     bool is_write;
+    bool serialising;
     QLIST_ENTRY(BdrvTrackedRequest) list;
     Coroutine *co; /* owner, used for deadlock detection */
     CoQueue wait_queue; /* coroutines blocked on this request */
@@ -296,8 +297,8 @@ struct BlockDriverState {
     /* Callback before write request is processed */
     NotifierWithReturnList before_write_notifiers;
 
-    /* number of in-flight copy-on-read requests */
-    unsigned int copy_on_read_in_flight;
+    /* number of in-flight serialising requests */
+    unsigned int serialising_in_flight;
 
     /* I/O throttling */
     ThrottleState throttle_state;
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 18/22] block: Make overlap range for serialisation dynamic
  2013-12-11 21:08 [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation Kevin Wolf
                   ` (16 preceding siblings ...)
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 17/22] block: Generalise and optimise COR serialisation Kevin Wolf
@ 2013-12-11 21:08 ` Kevin Wolf
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 19/22] block: Align requests in bdrv_co_do_pwritev() Kevin Wolf
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Kevin Wolf @ 2013-12-11 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, pl, stefanha

Copy on Read wants to serialise with all requests touching the same
cluster, so wait_serialising_requests() rounded to cluster boundaries.
Other users like alignment RMW will have different requirements, though
(requests touching the same sector), so make it dynamic.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c                   | 51 +++++++++++++++++++++++------------------------
 include/block/block_int.h |  4 ++++
 2 files changed, 29 insertions(+), 26 deletions(-)

diff --git a/block.c b/block.c
index 40be327..6630b41 100644
--- a/block.c
+++ b/block.c
@@ -2058,12 +2058,19 @@ static void tracked_request_begin(BdrvTrackedRequest *req,
     QLIST_INSERT_HEAD(&bs->tracked_requests, req, list);
 }
 
-static void mark_request_serialising(BdrvTrackedRequest *req)
+static void mark_request_serialising(BdrvTrackedRequest *req, size_t align)
 {
+    int64_t overlap_offset = req->offset & ~(align - 1);
+    int overlap_bytes = ROUND_UP(req->offset + req->bytes, align)
+                      - req->overlap_offset;
+
     if (!req->serialising) {
         req->bs->serialising_in_flight++;
         req->serialising = true;
     }
+
+    req->overlap_offset = MIN(req->overlap_offset, overlap_offset);
+    req->overlap_bytes = MAX(req->overlap_bytes, overlap_bytes);
 }
 
 /**
@@ -2087,20 +2094,16 @@ void bdrv_round_to_clusters(BlockDriverState *bs,
     }
 }
 
-static void round_bytes_to_clusters(BlockDriverState *bs,
-                                    int64_t offset, unsigned int bytes,
-                                    int64_t *cluster_offset,
-                                    unsigned int *cluster_bytes)
+static int bdrv_get_cluster_size(BlockDriverState *bs)
 {
     BlockDriverInfo bdi;
+    int ret;
 
-    if (bdrv_get_info(bs, &bdi) < 0 || bdi.cluster_size == 0) {
-        *cluster_offset = offset;
-        *cluster_bytes = bytes;
+    ret = bdrv_get_info(bs, &bdi);
+    if (ret < 0 || bdi.cluster_size == 0) {
+        return bs->request_alignment;
     } else {
-        *cluster_offset = QEMU_ALIGN_DOWN(offset, bdi.cluster_size);
-        *cluster_bytes = QEMU_ALIGN_UP(offset - *cluster_offset + bytes,
-                                       bdi.cluster_size);
+        return bdi.cluster_size;
     }
 }
 
@@ -2108,11 +2111,11 @@ static bool tracked_request_overlaps(BdrvTrackedRequest *req,
                                      int64_t offset, int bytes)
 {
     /*        aaaa   bbbb */
-    if (offset >= req->offset + req->bytes) {
+    if (offset >= req->overlap_offset + req->overlap_bytes) {
         return false;
     }
     /* bbbb   aaaa        */
-    if (req->offset >= offset + bytes) {
+    if (req->overlap_offset >= offset + bytes) {
         return false;
     }
     return true;
@@ -2122,30 +2125,21 @@ static void coroutine_fn wait_serialising_requests(BdrvTrackedRequest *self)
 {
     BlockDriverState *bs = self->bs;
     BdrvTrackedRequest *req;
-    int64_t cluster_offset;
-    unsigned int cluster_bytes;
     bool retry;
 
     if (!bs->serialising_in_flight) {
         return;
     }
 
-    /* If we touch the same cluster it counts as an overlap.  This guarantees
-     * that allocating writes will be serialized and not race with each other
-     * for the same cluster.  For example, in copy-on-read it ensures that the
-     * CoR read and write operations are atomic and guest writes cannot
-     * interleave between them.
-     */
-    round_bytes_to_clusters(bs, self->offset, self->bytes,
-                            &cluster_offset, &cluster_bytes);
-
     do {
         retry = false;
         QLIST_FOREACH(req, &bs->tracked_requests, list) {
             if (req == self || (!req->serialising && !self->serialising)) {
                 continue;
             }
-            if (tracked_request_overlaps(req, cluster_offset, cluster_bytes)) {
+            if (tracked_request_overlaps(req, self->overlap_offset,
+                                         self->overlap_bytes))
+            {
                 /* Hitting this means there was a reentrant request, for
                  * example, a block driver issuing nested requests.  This must
                  * never happen since it means deadlock.
@@ -2756,7 +2750,12 @@ static int coroutine_fn bdrv_aligned_preadv(BlockDriverState *bs,
 
     /* Handle Copy on Read and associated serialisation */
     if (flags & BDRV_REQ_COPY_ON_READ) {
-        mark_request_serialising(req);
+        /* If we touch the same cluster it counts as an overlap.  This
+         * guarantees that allocating writes will be serialized and not race
+         * with each other for the same cluster.  For example, in copy-on-read
+         * it ensures that the CoR read and write operations are atomic and
+         * guest writes cannot interleave between them. */
+        mark_request_serialising(req, bdrv_get_cluster_size(bs));
     }
 
     wait_serialising_requests(req);
diff --git a/include/block/block_int.h b/include/block/block_int.h
index b00402b..1aee02b 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -60,7 +60,11 @@ typedef struct BdrvTrackedRequest {
     int64_t offset;
     unsigned int bytes;
     bool is_write;
+
     bool serialising;
+    int64_t overlap_offset;
+    unsigned int overlap_bytes;
+
     QLIST_ENTRY(BdrvTrackedRequest) list;
     Coroutine *co; /* owner, used for deadlock detection */
     CoQueue wait_queue; /* coroutines blocked on this request */
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 19/22] block: Align requests in bdrv_co_do_pwritev()
  2013-12-11 21:08 [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation Kevin Wolf
                   ` (17 preceding siblings ...)
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 18/22] block: Make overlap range for serialisation dynamic Kevin Wolf
@ 2013-12-11 21:08 ` Kevin Wolf
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 20/22] block: Change coroutine wrapper to byte granularity Kevin Wolf
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Kevin Wolf @ 2013-12-11 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, pl, stefanha

This patch changes bdrv_co_do_pwritev() to actually be what its name
promises. If requests aren't properly aligned, it performs a RMW.

Requests touching the same block are serialised against the RMW request.
Further optimisation of this is possible by differentiating types of
requests (concurrent reads should actually be okay here).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 85 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 6630b41..98ea723 100644
--- a/block.c
+++ b/block.c
@@ -3039,6 +3039,12 @@ static int coroutine_fn bdrv_co_do_pwritev(BlockDriverState *bs,
     BdrvRequestFlags flags)
 {
     BdrvTrackedRequest req;
+    /* TODO Lift BDRV_SECTOR_SIZE restriction in BlockDriver interface */
+    uint64_t align = MAX(BDRV_SECTOR_SIZE, bs->request_alignment);
+    uint8_t *head_buf = NULL;
+    uint8_t *tail_buf = NULL;
+    QEMUIOVector local_qiov;
+    bool use_local_qiov = false;
     int ret;
 
     if (!bs->drv) {
@@ -3056,10 +3062,88 @@ static int coroutine_fn bdrv_co_do_pwritev(BlockDriverState *bs,
         bdrv_io_limits_intercept(bs, bytes << BDRV_SECTOR_BITS, true);
     }
 
+    /*
+     * Align write if necessary by performing a read-modify-write cycle.
+     * Pad qiov with the read parts and be sure to have a tracked request not
+     * only for bdrv_aligned_pwritev, but also for the reads of the RMW cycle.
+     */
     tracked_request_begin(&req, bs, offset, bytes, true);
-    ret = bdrv_aligned_pwritev(bs, &req, offset, bytes, qiov, flags);
+
+    if (offset & (align - 1)) {
+        QEMUIOVector head_qiov;
+        struct iovec head_iov;
+
+        mark_request_serialising(&req, align);
+        wait_serialising_requests(&req);
+
+        head_buf = qemu_blockalign(bs, align);
+        head_iov = (struct iovec) {
+            .iov_base   = head_buf,
+            .iov_len    = align,
+        };
+        qemu_iovec_init_external(&head_qiov, &head_iov, 1);
+
+        ret = bdrv_aligned_preadv(bs, &req, offset & ~(align - 1), align,
+                                  align, &head_qiov, 0);
+        if (ret < 0) {
+            goto fail;
+        }
+
+        qemu_iovec_init(&local_qiov, qiov->niov + 2);
+        qemu_iovec_add(&local_qiov, head_buf, offset & (align - 1));
+        qemu_iovec_concat(&local_qiov, qiov, 0, qiov->size);
+        use_local_qiov = true;
+
+        bytes += offset & (align - 1);
+        offset = offset & ~(align - 1);
+    }
+
+    if ((offset + bytes) & (align - 1)) {
+        QEMUIOVector tail_qiov;
+        struct iovec tail_iov;
+        size_t tail_bytes;
+
+        mark_request_serialising(&req, align);
+        wait_serialising_requests(&req);
+
+        tail_buf = qemu_blockalign(bs, align);
+        tail_iov = (struct iovec) {
+            .iov_base   = tail_buf,
+            .iov_len    = align,
+        };
+        qemu_iovec_init_external(&tail_qiov, &tail_iov, 1);
+
+        ret = bdrv_aligned_preadv(bs, &req, (offset + bytes) & ~(align - 1), align,
+                                  align, &tail_qiov, 0);
+        if (ret < 0) {
+            goto fail;
+        }
+
+        if (!use_local_qiov) {
+            qemu_iovec_init(&local_qiov, qiov->niov + 1);
+            qemu_iovec_concat(&local_qiov, qiov, 0, qiov->size);
+            use_local_qiov = true;
+        }
+
+        tail_bytes = (offset + bytes) & (align - 1);
+        qemu_iovec_add(&local_qiov, tail_buf + tail_bytes, align - tail_bytes);
+
+        bytes = ROUND_UP(bytes, align);
+    }
+
+    ret = bdrv_aligned_pwritev(bs, &req, offset, bytes,
+                               use_local_qiov ? &local_qiov : qiov,
+                               flags);
+
+fail:
     tracked_request_end(&req);
 
+    if (use_local_qiov) {
+        qemu_iovec_destroy(&local_qiov);
+        qemu_vfree(head_buf);
+        qemu_vfree(tail_buf);
+    }
+
     return ret;
 }
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 20/22] block: Change coroutine wrapper to byte granularity
  2013-12-11 21:08 [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation Kevin Wolf
                   ` (18 preceding siblings ...)
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 19/22] block: Align requests in bdrv_co_do_pwritev() Kevin Wolf
@ 2013-12-11 21:08 ` Kevin Wolf
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 21/22] block: Make bdrv_pread() a bdrv_prwv_co() wrapper Kevin Wolf
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 39+ messages in thread
From: Kevin Wolf @ 2013-12-11 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, pl, stefanha

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 48 ++++++++++++++++++++++++++----------------------
 1 file changed, 26 insertions(+), 22 deletions(-)

diff --git a/block.c b/block.c
index 98ea723..5e2e092 100644
--- a/block.c
+++ b/block.c
@@ -69,11 +69,11 @@ static int coroutine_fn bdrv_co_readv_em(BlockDriverState *bs,
 static int coroutine_fn bdrv_co_writev_em(BlockDriverState *bs,
                                          int64_t sector_num, int nb_sectors,
                                          QEMUIOVector *iov);
-static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
-    int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
+static int coroutine_fn bdrv_co_do_preadv(BlockDriverState *bs,
+    int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
     BdrvRequestFlags flags);
-static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
-    int64_t sector_num, int nb_sectors, QEMUIOVector *qiov,
+static int coroutine_fn bdrv_co_do_pwritev(BlockDriverState *bs,
+    int64_t offset, unsigned int bytes, QEMUIOVector *qiov,
     BdrvRequestFlags flags);
 static BlockDriverAIOCB *bdrv_co_aio_rw_vector(BlockDriverState *bs,
                                                int64_t sector_num,
@@ -2361,8 +2361,7 @@ static int bdrv_check_request(BlockDriverState *bs, int64_t sector_num,
 
 typedef struct RwCo {
     BlockDriverState *bs;
-    int64_t sector_num;
-    int nb_sectors;
+    int64_t offset;
     QEMUIOVector *qiov;
     bool is_write;
     int ret;
@@ -2374,34 +2373,32 @@ static void coroutine_fn bdrv_rw_co_entry(void *opaque)
     RwCo *rwco = opaque;
 
     if (!rwco->is_write) {
-        rwco->ret = bdrv_co_do_readv(rwco->bs, rwco->sector_num,
-                                     rwco->nb_sectors, rwco->qiov,
-                                     rwco->flags);
-    } else {
-        rwco->ret = bdrv_co_do_writev(rwco->bs, rwco->sector_num,
-                                      rwco->nb_sectors, rwco->qiov,
+        rwco->ret = bdrv_co_do_preadv(rwco->bs, rwco->offset,
+                                      rwco->qiov->size, rwco->qiov,
                                       rwco->flags);
+    } else {
+        rwco->ret = bdrv_co_do_pwritev(rwco->bs, rwco->offset,
+                                       rwco->qiov->size, rwco->qiov,
+                                       rwco->flags);
     }
 }
 
 /*
  * Process a vectored synchronous request using coroutines
  */
-static int bdrv_rwv_co(BlockDriverState *bs, int64_t sector_num,
-                       QEMUIOVector *qiov, bool is_write,
-                       BdrvRequestFlags flags)
+static int bdrv_prwv_co(BlockDriverState *bs, int64_t offset,
+                        QEMUIOVector *qiov, bool is_write,
+                        BdrvRequestFlags flags)
 {
     Coroutine *co;
     RwCo rwco = {
         .bs = bs,
-        .sector_num = sector_num,
-        .nb_sectors = qiov->size >> BDRV_SECTOR_BITS,
+        .offset = offset,
         .qiov = qiov,
         .is_write = is_write,
         .ret = NOT_DONE,
         .flags = flags,
     };
-    assert((qiov->size & (BDRV_SECTOR_SIZE - 1)) == 0);
 
     /**
      * In sync call context, when the vcpu is blocked, this throttling timer
@@ -2440,7 +2437,8 @@ static int bdrv_rw_co(BlockDriverState *bs, int64_t sector_num, uint8_t *buf,
     };
 
     qemu_iovec_init_external(&qiov, &iov, 1);
-    return bdrv_rwv_co(bs, sector_num, &qiov, is_write, flags);
+    return bdrv_prwv_co(bs, sector_num << BDRV_SECTOR_BITS,
+                        &qiov, is_write, flags);
 }
 
 /* return < 0 if error. See bdrv_write() for the return codes */
@@ -2478,7 +2476,7 @@ int bdrv_write(BlockDriverState *bs, int64_t sector_num,
 
 int bdrv_writev(BlockDriverState *bs, int64_t sector_num, QEMUIOVector *qiov)
 {
-    return bdrv_rwv_co(bs, sector_num, qiov, true, 0);
+    return bdrv_prwv_co(bs, sector_num << BDRV_SECTOR_BITS, qiov, true, 0);
 }
 
 int bdrv_write_zeroes(BlockDriverState *bs, int64_t sector_num,
@@ -4586,9 +4584,15 @@ int bdrv_flush(BlockDriverState *bs)
     return rwco.ret;
 }
 
+typedef struct DiscardCo {
+    BlockDriverState *bs;
+    int64_t sector_num;
+    int nb_sectors;
+    int ret;
+} DiscardCo;
 static void coroutine_fn bdrv_discard_co_entry(void *opaque)
 {
-    RwCo *rwco = opaque;
+    DiscardCo *rwco = opaque;
 
     rwco->ret = bdrv_co_discard(rwco->bs, rwco->sector_num, rwco->nb_sectors);
 }
@@ -4672,7 +4676,7 @@ int coroutine_fn bdrv_co_discard(BlockDriverState *bs, int64_t sector_num,
 int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors)
 {
     Coroutine *co;
-    RwCo rwco = {
+    DiscardCo rwco = {
         .bs = bs,
         .sector_num = sector_num,
         .nb_sectors = nb_sectors,
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 21/22] block: Make bdrv_pread() a bdrv_prwv_co() wrapper
  2013-12-11 21:08 [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation Kevin Wolf
                   ` (19 preceding siblings ...)
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 20/22] block: Change coroutine wrapper to byte granularity Kevin Wolf
@ 2013-12-11 21:08 ` Kevin Wolf
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 22/22] block: Make bdrv_pwrite() " Kevin Wolf
  2013-12-12  6:09 ` [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation Peter Lieven
  22 siblings, 0 replies; 39+ messages in thread
From: Kevin Wolf @ 2013-12-11 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, pl, stefanha

Instead of implementing the alignment adjustment here, use the now
existing functionality of bdrv_co_do_preadv().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 49 +++++++++++++------------------------------------
 1 file changed, 13 insertions(+), 36 deletions(-)

diff --git a/block.c b/block.c
index 5e2e092..e8653e3 100644
--- a/block.c
+++ b/block.c
@@ -2523,49 +2523,26 @@ int bdrv_make_zero(BlockDriverState *bs, BdrvRequestFlags flags)
     }
 }
 
-int bdrv_pread(BlockDriverState *bs, int64_t offset,
-               void *buf, int count1)
+int bdrv_pread(BlockDriverState *bs, int64_t offset, void *buf, int bytes)
 {
-    uint8_t tmp_buf[BDRV_SECTOR_SIZE];
-    int len, nb_sectors, count;
-    int64_t sector_num;
+    QEMUIOVector qiov;
+    struct iovec iov = {
+        .iov_base = (void *)buf,
+        .iov_len = bytes,
+    };
     int ret;
 
-    count = count1;
-    /* first read to align to sector start */
-    len = (BDRV_SECTOR_SIZE - offset) & (BDRV_SECTOR_SIZE - 1);
-    if (len > count)
-        len = count;
-    sector_num = offset >> BDRV_SECTOR_BITS;
-    if (len > 0) {
-        if ((ret = bdrv_read(bs, sector_num, tmp_buf, 1)) < 0)
-            return ret;
-        memcpy(buf, tmp_buf + (offset & (BDRV_SECTOR_SIZE - 1)), len);
-        count -= len;
-        if (count == 0)
-            return count1;
-        sector_num++;
-        buf += len;
+    if (bytes < 0) {
+        return -EINVAL;
     }
 
-    /* read the sectors "in place" */
-    nb_sectors = count >> BDRV_SECTOR_BITS;
-    if (nb_sectors > 0) {
-        if ((ret = bdrv_read(bs, sector_num, buf, nb_sectors)) < 0)
-            return ret;
-        sector_num += nb_sectors;
-        len = nb_sectors << BDRV_SECTOR_BITS;
-        buf += len;
-        count -= len;
+    qemu_iovec_init_external(&qiov, &iov, 1);
+    ret = bdrv_prwv_co(bs, offset, &qiov, false, 0);
+    if (ret < 0) {
+        return ret;
     }
 
-    /* add data from the last sector */
-    if (count > 0) {
-        if ((ret = bdrv_read(bs, sector_num, tmp_buf, 1)) < 0)
-            return ret;
-        memcpy(buf, tmp_buf, count);
-    }
-    return count1;
+    return bytes;
 }
 
 int bdrv_pwritev(BlockDriverState *bs, int64_t offset, QEMUIOVector *qiov)
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 22/22] block: Make bdrv_pwrite() a bdrv_prwv_co() wrapper
  2013-12-11 21:08 [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation Kevin Wolf
                   ` (20 preceding siblings ...)
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 21/22] block: Make bdrv_pread() a bdrv_prwv_co() wrapper Kevin Wolf
@ 2013-12-11 21:08 ` Kevin Wolf
  2013-12-12  6:09 ` [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation Peter Lieven
  22 siblings, 0 replies; 39+ messages in thread
From: Kevin Wolf @ 2013-12-11 21:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, pl, stefanha

Instead of implementing the alignment adjustment here, use the now
existing functionality of bdrv_co_do_pwritev().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block.c | 64 +++++++++-------------------------------------------------------
 1 file changed, 9 insertions(+), 55 deletions(-)

diff --git a/block.c b/block.c
index e8653e3..4ab40f8 100644
--- a/block.c
+++ b/block.c
@@ -2474,11 +2474,6 @@ int bdrv_write(BlockDriverState *bs, int64_t sector_num,
     return bdrv_rw_co(bs, sector_num, (uint8_t *)buf, nb_sectors, true, 0);
 }
 
-int bdrv_writev(BlockDriverState *bs, int64_t sector_num, QEMUIOVector *qiov)
-{
-    return bdrv_prwv_co(bs, sector_num << BDRV_SECTOR_BITS, qiov, true, 0);
-}
-
 int bdrv_write_zeroes(BlockDriverState *bs, int64_t sector_num,
                       int nb_sectors, BdrvRequestFlags flags)
 {
@@ -2547,70 +2542,29 @@ int bdrv_pread(BlockDriverState *bs, int64_t offset, void *buf, int bytes)
 
 int bdrv_pwritev(BlockDriverState *bs, int64_t offset, QEMUIOVector *qiov)
 {
-    uint8_t tmp_buf[BDRV_SECTOR_SIZE];
-    int len, nb_sectors, count;
-    int64_t sector_num;
     int ret;
 
-    count = qiov->size;
-
-    /* first write to align to sector start */
-    len = (BDRV_SECTOR_SIZE - offset) & (BDRV_SECTOR_SIZE - 1);
-    if (len > count)
-        len = count;
-    sector_num = offset >> BDRV_SECTOR_BITS;
-    if (len > 0) {
-        if ((ret = bdrv_read(bs, sector_num, tmp_buf, 1)) < 0)
-            return ret;
-        qemu_iovec_to_buf(qiov, 0, tmp_buf + (offset & (BDRV_SECTOR_SIZE - 1)),
-                          len);
-        if ((ret = bdrv_write(bs, sector_num, tmp_buf, 1)) < 0)
-            return ret;
-        count -= len;
-        if (count == 0)
-            return qiov->size;
-        sector_num++;
-    }
-
-    /* write the sectors "in place" */
-    nb_sectors = count >> BDRV_SECTOR_BITS;
-    if (nb_sectors > 0) {
-        QEMUIOVector qiov_inplace;
-
-        qemu_iovec_init(&qiov_inplace, qiov->niov);
-        qemu_iovec_concat(&qiov_inplace, qiov, len,
-                          nb_sectors << BDRV_SECTOR_BITS);
-        ret = bdrv_writev(bs, sector_num, &qiov_inplace);
-        qemu_iovec_destroy(&qiov_inplace);
-        if (ret < 0) {
-            return ret;
-        }
-
-        sector_num += nb_sectors;
-        len = nb_sectors << BDRV_SECTOR_BITS;
-        count -= len;
+    ret = bdrv_prwv_co(bs, offset, qiov, true, 0);
+    if (ret < 0) {
+        return ret;
     }
 
-    /* add data from the last sector */
-    if (count > 0) {
-        if ((ret = bdrv_read(bs, sector_num, tmp_buf, 1)) < 0)
-            return ret;
-        qemu_iovec_to_buf(qiov, qiov->size - count, tmp_buf, count);
-        if ((ret = bdrv_write(bs, sector_num, tmp_buf, 1)) < 0)
-            return ret;
-    }
     return qiov->size;
 }
 
 int bdrv_pwrite(BlockDriverState *bs, int64_t offset,
-                const void *buf, int count1)
+                const void *buf, int bytes)
 {
     QEMUIOVector qiov;
     struct iovec iov = {
         .iov_base   = (void *) buf,
-        .iov_len    = count1,
+        .iov_len    = bytes,
     };
 
+    if (bytes < 0) {
+        return -EINVAL;
+    }
+
     qemu_iovec_init_external(&qiov, &iov, 1);
     return bdrv_pwritev(bs, offset, &qiov);
 }
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 01/22] block: Move initialisation of BlockLimits to bdrv_refresh_limits()
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 01/22] block: Move initialisation of BlockLimits to bdrv_refresh_limits() Kevin Wolf
@ 2013-12-12  2:54   ` Wenchao Xia
  2013-12-12  2:57     ` Wenchao Xia
  2013-12-12  6:17   ` Peter Lieven
  1 sibling, 1 reply; 39+ messages in thread
From: Wenchao Xia @ 2013-12-12  2:54 UTC (permalink / raw)
  To: Kevin Wolf, qemu-devel; +Cc: pbonzini, pl, stefanha

于 2013/12/12 5:08, Kevin Wolf 写道:
> This function separates filling the BlockLimits from bdrv_open(), which
> allows it to call it from other operations which may change the limits
> (e.g. modifications to the backing file chain or bdrv_reopen)
> 
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>   block.c                   | 21 ++++++++++++
>   block/iscsi.c             | 87 ++++++++++++++++++++++++++++-------------------
>   block/qcow2.c             | 11 +++++-
>   block/qed.c               | 11 +++++-
>   block/vmdk.c              | 22 +++++++++---
>   include/block/block_int.h |  2 ++
>   6 files changed, 113 insertions(+), 41 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 13f001a..c32d856 100644
> --- a/block.c
> +++ b/block.c
> @@ -479,6 +479,21 @@ int bdrv_create_file(const char* filename, QEMUOptionParameter *options,
>       return ret;
>   }
> 
> +static int bdrv_refresh_limits(BlockDriverState *bs)
> +{
> +    BlockDriver *drv = bs->drv;
> +
> +    memset(&bs->bl, 0, sizeof(bs->bl));
> +
> +    if (!drv) {
> +        return 0;
> +    } else if (drv->bdrv_refresh_limits) {
> +        return drv->bdrv_refresh_limits(bs);
> +    }
> +
> +    return 0;
    It seems this line can be removed.

> +}
> +
>   /*
>    * Create a uniquely-named empty temporary file.
>    * Return 0 upon success, otherwise a negative errno value.
> @@ -833,6 +848,8 @@ static int bdrv_open_common(BlockDriverState *bs, BlockDriverState *file,
>           goto free_and_fail;
>       }
> 
> +    bdrv_refresh_limits(bs);
> +
>   #ifndef _WIN32
>       if (bs->is_temporary) {
>           assert(bs->filename[0] != '\0');
> @@ -1018,6 +1035,10 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *options, Error **errp)
>       }
>       pstrcpy(bs->backing_file, sizeof(bs->backing_file),
>               bs->backing_hd->file->filename);
> +
> +    /* Recalculate the BlockLimits with the backing file */
> +    bdrv_refresh_limits(bs);
> +
>       return 0;
>   }
> 
> diff --git a/block/iscsi.c b/block/iscsi.c
> index 829d444..f3ded8c 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -1425,6 +1425,56 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
>           task = NULL;
>       }
> 
> +#if defined(LIBISCSI_FEATURE_NOP_COUNTER)
> +    /* Set up a timer for sending out iSCSI NOPs */
> +    iscsilun->nop_timer = timer_new_ms(QEMU_CLOCK_REALTIME, iscsi_nop_timed_event, iscsilun);
> +    timer_mod(iscsilun->nop_timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + NOP_INTERVAL);
> +#endif
> +
> +out:
> +    qemu_opts_del(opts);
> +    if (initiator_name != NULL) {
> +        g_free(initiator_name);
> +    }
> +    if (iscsi_url != NULL) {
> +        iscsi_destroy_url(iscsi_url);
> +    }
> +    if (task != NULL) {
> +        scsi_free_scsi_task(task);
> +    }
> +
> +    if (ret) {
> +        if (iscsi != NULL) {
> +            iscsi_destroy_context(iscsi);
> +        }
> +        memset(iscsilun, 0, sizeof(IscsiLun));
> +    }
> +    return ret;
> +}
> +
> +static void iscsi_close(BlockDriverState *bs)
> +{
> +    IscsiLun *iscsilun = bs->opaque;
> +    struct iscsi_context *iscsi = iscsilun->iscsi;
> +
> +    if (iscsilun->nop_timer) {
> +        timer_del(iscsilun->nop_timer);
> +        timer_free(iscsilun->nop_timer);
> +    }
> +    qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), NULL, NULL, NULL);
> +    iscsi_destroy_context(iscsi);
> +    g_free(iscsilun->zeroblock);
> +    memset(iscsilun, 0, sizeof(IscsiLun));
> +}
> +
> +static int iscsi_refresh_limits(BlockDriverState *bs)
> +{
> +    IscsiLun *iscsilun = bs->opaque;
> +    struct scsi_task *task = NULL;
> +    int ret;
> +
> +    memset(&bs->bl, 0, sizeof(bs->bl));
> +
>       if (iscsilun->lbp.lbpu || iscsilun->lbp.lbpws) {
>           struct scsi_inquiry_block_limits *inq_bl;
>           task = iscsi_do_inquiry(iscsilun->iscsi, iscsilun->lun, 1,
> @@ -1462,48 +1512,14 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
>                                                        iscsilun);
>       }
> 
> -#if defined(LIBISCSI_FEATURE_NOP_COUNTER)
> -    /* Set up a timer for sending out iSCSI NOPs */
> -    iscsilun->nop_timer = timer_new_ms(QEMU_CLOCK_REALTIME, iscsi_nop_timed_event, iscsilun);
> -    timer_mod(iscsilun->nop_timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + NOP_INTERVAL);
> -#endif
> -
> +    ret = 0;
>   out:
> -    qemu_opts_del(opts);
> -    if (initiator_name != NULL) {
> -        g_free(initiator_name);
> -    }
> -    if (iscsi_url != NULL) {
> -        iscsi_destroy_url(iscsi_url);
> -    }
>       if (task != NULL) {
>           scsi_free_scsi_task(task);
>       }
> -
> -    if (ret) {
> -        if (iscsi != NULL) {
> -            iscsi_destroy_context(iscsi);
> -        }
> -        memset(iscsilun, 0, sizeof(IscsiLun));
> -    }
>       return ret;
>   }
> 
> -static void iscsi_close(BlockDriverState *bs)
> -{
> -    IscsiLun *iscsilun = bs->opaque;
> -    struct iscsi_context *iscsi = iscsilun->iscsi;
> -
> -    if (iscsilun->nop_timer) {
> -        timer_del(iscsilun->nop_timer);
> -        timer_free(iscsilun->nop_timer);
> -    }
> -    qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), NULL, NULL, NULL);
> -    iscsi_destroy_context(iscsi);
> -    g_free(iscsilun->zeroblock);
> -    memset(iscsilun, 0, sizeof(IscsiLun));
> -}
> -
>   static int iscsi_truncate(BlockDriverState *bs, int64_t offset)
>   {
>       IscsiLun *iscsilun = bs->opaque;
> @@ -1616,6 +1632,7 @@ static BlockDriver bdrv_iscsi = {
>       .bdrv_getlength  = iscsi_getlength,
>       .bdrv_get_info   = iscsi_get_info,
>       .bdrv_truncate   = iscsi_truncate,
> +    .bdrv_refresh_limits = iscsi_refresh_limits,
> 
>   #if defined(LIBISCSI_FEATURE_IOVECTOR)
>       .bdrv_co_get_block_status = iscsi_co_get_block_status,
> diff --git a/block/qcow2.c b/block/qcow2.c
> index f29aa88..f40355c 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -718,7 +718,6 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>       }
> 
>       qemu_opts_del(opts);
> -    bs->bl.write_zeroes_alignment = s->cluster_sectors;
> 
>       if (s->use_lazy_refcounts && s->qcow_version < 3) {
>           error_setg(errp, "Lazy refcounts require a qcow2 image with at least "
> @@ -751,6 +750,15 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>       return ret;
>   }
> 
> +static int qcow2_refresh_limits(BlockDriverState *bs)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +
> +    bs->bl.write_zeroes_alignment = s->cluster_sectors;
> +
> +    return 0;
> +}
> +
>   static int qcow2_set_key(BlockDriverState *bs, const char *key)
>   {
>       BDRVQcowState *s = bs->opaque;
> @@ -2268,6 +2276,7 @@ static BlockDriver bdrv_qcow2 = {
> 
>       .bdrv_change_backing_file   = qcow2_change_backing_file,
> 
> +    .bdrv_refresh_limits        = qcow2_refresh_limits,
>       .bdrv_invalidate_cache      = qcow2_invalidate_cache,
> 
>       .create_options = qcow2_create_options,
> diff --git a/block/qed.c b/block/qed.c
> index 450a1fa..d35bd1c 100644
> --- a/block/qed.c
> +++ b/block/qed.c
> @@ -495,7 +495,6 @@ static int bdrv_qed_open(BlockDriverState *bs, QDict *options, int flags,
>           }
>       }
> 
> -    bs->bl.write_zeroes_alignment = s->header.cluster_size >> BDRV_SECTOR_BITS;
>       s->need_check_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
>                                               qed_need_check_timer_cb, s);
> 
> @@ -507,6 +506,15 @@ out:
>       return ret;
>   }
> 
> +static int bdrv_qed_refresh_limits(BlockDriverState *bs)
> +{
> +    BDRVQEDState *s = bs->opaque;
> +
> +    bs->bl.write_zeroes_alignment = s->header.cluster_size >> BDRV_SECTOR_BITS;
> +
> +    return 0;
> +}
> +
>   /* We have nothing to do for QED reopen, stubs just return
>    * success */
>   static int bdrv_qed_reopen_prepare(BDRVReopenState *state,
> @@ -1616,6 +1624,7 @@ static BlockDriver bdrv_qed = {
>       .bdrv_truncate            = bdrv_qed_truncate,
>       .bdrv_getlength           = bdrv_qed_getlength,
>       .bdrv_get_info            = bdrv_qed_get_info,
> +    .bdrv_refresh_limits      = bdrv_qed_refresh_limits,
>       .bdrv_change_backing_file = bdrv_qed_change_backing_file,
>       .bdrv_invalidate_cache    = bdrv_qed_invalidate_cache,
>       .bdrv_check               = bdrv_qed_check,
> diff --git a/block/vmdk.c b/block/vmdk.c
> index 0734bc2..f8a387f 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -428,10 +428,6 @@ static int vmdk_add_extent(BlockDriverState *bs,
>       extent->l2_size = l2_size;
>       extent->cluster_sectors = flat ? sectors : cluster_sectors;
> 
> -    if (!flat) {
> -        bs->bl.write_zeroes_alignment =
> -            MAX(bs->bl.write_zeroes_alignment, cluster_sectors);
> -    }
>       if (s->num_extents > 1) {
>           extent->end_sector = (*(extent - 1)).end_sector + extent->sectors;
>       } else {
> @@ -886,6 +882,23 @@ fail:
>       return ret;
>   }
> 
> +
> +static int vmdk_refresh_limits(BlockDriverState *bs)
> +{
> +    BDRVVmdkState *s = bs->opaque;
> +    int i;
> +
> +    for (i = 0; i < s->num_extents; i++) {
> +        if (!s->extents[i].flat) {
> +            bs->bl.write_zeroes_alignment =
> +                MAX(bs->bl.write_zeroes_alignment,
> +                    s->extents[i].cluster_sectors);
> +        }
> +    }
> +
> +    return 0;
> +}
> +
>   static int get_whole_cluster(BlockDriverState *bs,
>                   VmdkExtent *extent,
>                   uint64_t cluster_offset,
> @@ -1971,6 +1984,7 @@ static BlockDriver bdrv_vmdk = {
>       .bdrv_get_allocated_file_size = vmdk_get_allocated_file_size,
>       .bdrv_has_zero_init           = vmdk_has_zero_init,
>       .bdrv_get_specific_info       = vmdk_get_specific_info,
> +    .bdrv_refresh_limits          = vmdk_refresh_limits,
> 
>       .create_options               = vmdk_create_options,
>   };
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 8b132d7..c49fa6b 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -226,6 +226,8 @@ struct BlockDriver {
>       int (*bdrv_debug_resume)(BlockDriverState *bs, const char *tag);
>       bool (*bdrv_debug_is_suspended)(BlockDriverState *bs, const char *tag);
> 
> +    int (*bdrv_refresh_limits)(BlockDriverState *bs);
> +
>       /*
>        * Returns 1 if newly created images are guaranteed to contain only
>        * zeros, 0 otherwise.
> 

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 01/22] block: Move initialisation of BlockLimits to bdrv_refresh_limits()
  2013-12-12  2:54   ` Wenchao Xia
@ 2013-12-12  2:57     ` Wenchao Xia
  2013-12-12  7:51       ` Thomas Huth
  0 siblings, 1 reply; 39+ messages in thread
From: Wenchao Xia @ 2013-12-12  2:57 UTC (permalink / raw)
  To: Kevin Wolf, qemu-devel; +Cc: pbonzini, pl, stefanha


>> +static int bdrv_refresh_limits(BlockDriverState *bs)
>> +{
>> +    BlockDriver *drv = bs->drv;
>> +
>> +    memset(&bs->bl, 0, sizeof(bs->bl));
>> +
>> +    if (!drv) {
>> +        return 0;
>> +    } else if (drv->bdrv_refresh_limits) {
>> +        return drv->bdrv_refresh_limits(bs);
>> +    }
>> +
>> +    return 0;
>      It seems this line can be removed.
> 
  I missed the "else if", then the patch is OK.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 02/22] block: Inherit opt_transfer_length
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 02/22] block: Inherit opt_transfer_length Kevin Wolf
@ 2013-12-12  3:02   ` Wenchao Xia
  0 siblings, 0 replies; 39+ messages in thread
From: Wenchao Xia @ 2013-12-12  3:02 UTC (permalink / raw)
  To: Kevin Wolf, qemu-devel; +Cc: pbonzini, pl, stefanha

Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 03/22] block: Update BlockLimits when they might have changed
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 03/22] block: Update BlockLimits when they might have changed Kevin Wolf
@ 2013-12-12  3:09   ` Wenchao Xia
  0 siblings, 0 replies; 39+ messages in thread
From: Wenchao Xia @ 2013-12-12  3:09 UTC (permalink / raw)
  To: Kevin Wolf, qemu-devel; +Cc: pbonzini, pl, stefanha

Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 04/22] qemu_memalign: Allow small alignments
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 04/22] qemu_memalign: Allow small alignments Kevin Wolf
@ 2013-12-12  3:09   ` Wenchao Xia
  0 siblings, 0 replies; 39+ messages in thread
From: Wenchao Xia @ 2013-12-12  3:09 UTC (permalink / raw)
  To: Kevin Wolf, qemu-devel; +Cc: pbonzini, pl, stefanha

Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 05/22] block: Detect unaligned length in bdrv_qiov_is_aligned()
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 05/22] block: Detect unaligned length in bdrv_qiov_is_aligned() Kevin Wolf
@ 2013-12-12  3:15   ` Wenchao Xia
  0 siblings, 0 replies; 39+ messages in thread
From: Wenchao Xia @ 2013-12-12  3:15 UTC (permalink / raw)
  To: Kevin Wolf, qemu-devel; +Cc: pbonzini, pl, stefanha

Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation
  2013-12-11 21:08 [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation Kevin Wolf
                   ` (21 preceding siblings ...)
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 22/22] block: Make bdrv_pwrite() " Kevin Wolf
@ 2013-12-12  6:09 ` Peter Lieven
  2013-12-12  9:47   ` Kevin Wolf
  22 siblings, 1 reply; 39+ messages in thread
From: Peter Lieven @ 2013-12-12  6:09 UTC (permalink / raw)
  To: Kevin Wolf, qemu-devel; +Cc: pbonzini, stefanha

Should it be possible to boot from a 4K-native drive with this series?
If yes, I will run some test with some new iSCSI arrays we got for testing
they can export 4k blocksize LUNs.

Anyway, can you please include the patch from Paolo which sets
the bs->request_alignment = iscsilun->block_size in iscsi_open.
It should be a one-liner.

Peter

Am 11.12.2013 22:08, schrieb Kevin Wolf:
> This patch series adds code to the block layer that allows performing
> I/O requests in smaller granularities than required by the host backend
> (most importantly, O_DIRECT restrictions). It achieves this for reads
> by rounding the request to host-side block boundary, and for writes by
> performing a read-modify-write cycle (and serialising requests
> touching the same block so that the RMW doesn't write back stale data).
>
> Originally I intended to reuse a lot of code from Paolo's previous
> patch series, however as I tried to integrate pread/pwrite, which
> already do a very similar thing (except for considering concurrency),
> and because I wanted to implement zero-copy, most of this series ended
> up being new code.
>
> Zero-copy is possible in a common case because while XFS defauls to a
> 4k sector size and therefore 4k on-disk O_DIRECT alignment for 512E
> disks, it still only has a 512 byte memory alignment requirement.
> (Unfortunately the XFS_IOC_DIOINFO ioctl claims 4k even for memory, but
> we know that the value is wrong and can probe it.)
>
>
> Changes since RFC:
> - Moved opt_mem_alignment into BlockLimits [Paolo]
>
> - Changed BlockLimits in turn to work a bit more like the
>   .bdrv_opt_mem_align() callback of the RFC; allows updating the
>   BlockLimits later when the chain changes or bdrv_reopen() toggles
>   O_DIRECT
>
> - Fixed a typo in a commit message [Eric]
>
>
> Kevin Wolf (20):
>   block: Move initialisation of BlockLimits to bdrv_refresh_limits()
>   block: Inherit opt_transfer_length
>   block: Update BlockLimits when they might have changed
>   qemu_memalign: Allow small alignments
>   block: Detect unaligned length in bdrv_qiov_is_aligned()
>   block: Don't use guest sector size for qemu_blockalign()
>   block: Introduce bdrv_aligned_preadv()
>   block: Introduce bdrv_co_do_preadv()
>   block: Introduce bdrv_aligned_pwritev()
>   block: write: Handle COR dependency after I/O throttling
>   block: Introduce bdrv_co_do_pwritev()
>   block: Switch BdrvTrackedRequest to byte granularity
>   block: Allow waiting for overlapping requests between begin/end
>   block: Make zero-after-EOF work with larger alignment
>   block: Generalise and optimise COR serialisation
>   block: Make overlap range for serialisation dynamic
>   block: Align requests in bdrv_co_do_pwritev()
>   block: Change coroutine wrapper to byte granularity
>   block: Make bdrv_pread() a bdrv_prwv_co() wrapper
>   block: Make bdrv_pwrite() a bdrv_prwv_co() wrapper
>
> Paolo Bonzini (2):
>   block: rename buffer_alignment to guest_block_size
>   raw: Probe required direct I/O alignment
>
>  block.c                   | 602 +++++++++++++++++++++++++++++++---------------
>  block/backup.c            |   7 +-
>  block/iscsi.c             |  87 ++++---
>  block/qcow2.c             |  11 +-
>  block/qed.c               |  11 +-
>  block/raw-posix.c         | 102 ++++++--
>  block/raw-win32.c         |  41 ++++
>  block/stream.c            |   2 +
>  block/vmdk.c              |  22 +-
>  hw/block/virtio-blk.c     |   2 +-
>  hw/ide/core.c             |   2 +-
>  hw/scsi/scsi-disk.c       |   2 +-
>  hw/scsi/scsi-generic.c    |   2 +-
>  include/block/block.h     |   6 +-
>  include/block/block_int.h |  25 +-
>  util/oslib-posix.c        |   5 +
>  16 files changed, 666 insertions(+), 263 deletions(-)
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 01/22] block: Move initialisation of BlockLimits to bdrv_refresh_limits()
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 01/22] block: Move initialisation of BlockLimits to bdrv_refresh_limits() Kevin Wolf
  2013-12-12  2:54   ` Wenchao Xia
@ 2013-12-12  6:17   ` Peter Lieven
  2013-12-12  9:35     ` Kevin Wolf
  1 sibling, 1 reply; 39+ messages in thread
From: Peter Lieven @ 2013-12-12  6:17 UTC (permalink / raw)
  To: Kevin Wolf, qemu-devel; +Cc: pbonzini, stefanha

Am 11.12.2013 22:08, schrieb Kevin Wolf:
> This function separates filling the BlockLimits from bdrv_open(), which
> allows it to call it from other operations which may change the limits
> (e.g. modifications to the backing file chain or bdrv_reopen)
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block.c                   | 21 ++++++++++++
>  block/iscsi.c             | 87 ++++++++++++++++++++++++++++-------------------
>  block/qcow2.c             | 11 +++++-
>  block/qed.c               | 11 +++++-
>  block/vmdk.c              | 22 +++++++++---
>  include/block/block_int.h |  2 ++
>  6 files changed, 113 insertions(+), 41 deletions(-)
>
> diff --git a/block.c b/block.c
> index 13f001a..c32d856 100644
> --- a/block.c
> +++ b/block.c
> @@ -479,6 +479,21 @@ int bdrv_create_file(const char* filename, QEMUOptionParameter *options,
>      return ret;
>  }
>  
> +static int bdrv_refresh_limits(BlockDriverState *bs)
> +{
> +    BlockDriver *drv = bs->drv;
> +
> +    memset(&bs->bl, 0, sizeof(bs->bl));
> +
> +    if (!drv) {
> +        return 0;
> +    } else if (drv->bdrv_refresh_limits) {
> +        return drv->bdrv_refresh_limits(bs);
> +    }
> +
> +    return 0;
> +}
> +
>  /*
>   * Create a uniquely-named empty temporary file.
>   * Return 0 upon success, otherwise a negative errno value.
> @@ -833,6 +848,8 @@ static int bdrv_open_common(BlockDriverState *bs, BlockDriverState *file,
>          goto free_and_fail;
>      }
>  
> +    bdrv_refresh_limits(bs);
> +
>  #ifndef _WIN32
>      if (bs->is_temporary) {
>          assert(bs->filename[0] != '\0');
> @@ -1018,6 +1035,10 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict *options, Error **errp)
>      }
>      pstrcpy(bs->backing_file, sizeof(bs->backing_file),
>              bs->backing_hd->file->filename);
> +
> +    /* Recalculate the BlockLimits with the backing file */
> +    bdrv_refresh_limits(bs);
> +
>      return 0;
>  }
>  
> diff --git a/block/iscsi.c b/block/iscsi.c
> index 829d444..f3ded8c 100644
> --- a/block/iscsi.c
> +++ b/block/iscsi.c
> @@ -1425,6 +1425,56 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
>          task = NULL;
>      }
>  
> +#if defined(LIBISCSI_FEATURE_NOP_COUNTER)
> +    /* Set up a timer for sending out iSCSI NOPs */
> +    iscsilun->nop_timer = timer_new_ms(QEMU_CLOCK_REALTIME, iscsi_nop_timed_event, iscsilun);
> +    timer_mod(iscsilun->nop_timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + NOP_INTERVAL);
> +#endif
> +
> +out:
> +    qemu_opts_del(opts);
> +    if (initiator_name != NULL) {
> +        g_free(initiator_name);
> +    }
> +    if (iscsi_url != NULL) {
> +        iscsi_destroy_url(iscsi_url);
> +    }
> +    if (task != NULL) {
> +        scsi_free_scsi_task(task);
> +    }
> +
> +    if (ret) {
> +        if (iscsi != NULL) {
> +            iscsi_destroy_context(iscsi);
> +        }
> +        memset(iscsilun, 0, sizeof(IscsiLun));
> +    }
> +    return ret;
> +}
> +
> +static void iscsi_close(BlockDriverState *bs)
> +{
> +    IscsiLun *iscsilun = bs->opaque;
> +    struct iscsi_context *iscsi = iscsilun->iscsi;
> +
> +    if (iscsilun->nop_timer) {
> +        timer_del(iscsilun->nop_timer);
> +        timer_free(iscsilun->nop_timer);
> +    }
> +    qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), NULL, NULL, NULL);
> +    iscsi_destroy_context(iscsi);
> +    g_free(iscsilun->zeroblock);
> +    memset(iscsilun, 0, sizeof(IscsiLun));
> +}
> +
> +static int iscsi_refresh_limits(BlockDriverState *bs)
> +{
> +    IscsiLun *iscsilun = bs->opaque;
> +    struct scsi_task *task = NULL;
> +    int ret;
> +
> +    memset(&bs->bl, 0, sizeof(bs->bl));
> +
you do that memset in bdrv_refresh_limits already.
>      if (iscsilun->lbp.lbpu || iscsilun->lbp.lbpws) {
>          struct scsi_inquiry_block_limits *inq_bl;
>          task = iscsi_do_inquiry(iscsilun->iscsi, iscsilun->lun, 1,
> @@ -1462,48 +1512,14 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,

iscsi_do_inquiry does a sync call to request the pages from the target.
you can only do that here if you can guarantee that iscsi_refresh_limits will
be called with no pending requests in-flight otherwise this will end in a mess.

otherwise i would propose to leave the inquiry call ins iscsi_open and just fill
the bs->bl struc tin iscsi_refresh_limits. the problem is if we make this async
we have to handle all possible I/O callbacks whiile in iscsi_refresh_limits.
i do not see how an iscsi target can change its limits while a target is mounted.

Peter

>                                                       iscsilun);
>      }
>  
> -#if defined(LIBISCSI_FEATURE_NOP_COUNTER)
> -    /* Set up a timer for sending out iSCSI NOPs */
> -    iscsilun->nop_timer = timer_new_ms(QEMU_CLOCK_REALTIME, iscsi_nop_timed_event, iscsilun);
> -    timer_mod(iscsilun->nop_timer, qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + NOP_INTERVAL);
> -#endif
> -
> +    ret = 0;
>  out:
> -    qemu_opts_del(opts);
> -    if (initiator_name != NULL) {
> -        g_free(initiator_name);
> -    }
> -    if (iscsi_url != NULL) {
> -        iscsi_destroy_url(iscsi_url);
> -    }
>      if (task != NULL) {
>          scsi_free_scsi_task(task);
>      }
> -
> -    if (ret) {
> -        if (iscsi != NULL) {
> -            iscsi_destroy_context(iscsi);
> -        }
> -        memset(iscsilun, 0, sizeof(IscsiLun));
> -    }
>      return ret;
>  }
>  
> -static void iscsi_close(BlockDriverState *bs)
> -{
> -    IscsiLun *iscsilun = bs->opaque;
> -    struct iscsi_context *iscsi = iscsilun->iscsi;
> -
> -    if (iscsilun->nop_timer) {
> -        timer_del(iscsilun->nop_timer);
> -        timer_free(iscsilun->nop_timer);
> -    }
> -    qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), NULL, NULL, NULL);
> -    iscsi_destroy_context(iscsi);
> -    g_free(iscsilun->zeroblock);
> -    memset(iscsilun, 0, sizeof(IscsiLun));
> -}
> -
>  static int iscsi_truncate(BlockDriverState *bs, int64_t offset)
>  {
>      IscsiLun *iscsilun = bs->opaque;
> @@ -1616,6 +1632,7 @@ static BlockDriver bdrv_iscsi = {
>      .bdrv_getlength  = iscsi_getlength,
>      .bdrv_get_info   = iscsi_get_info,
>      .bdrv_truncate   = iscsi_truncate,
> +    .bdrv_refresh_limits = iscsi_refresh_limits,
>  
>  #if defined(LIBISCSI_FEATURE_IOVECTOR)
>      .bdrv_co_get_block_status = iscsi_co_get_block_status,
> diff --git a/block/qcow2.c b/block/qcow2.c
> index f29aa88..f40355c 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -718,7 +718,6 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>      }
>  
>      qemu_opts_del(opts);
> -    bs->bl.write_zeroes_alignment = s->cluster_sectors;
>  
>      if (s->use_lazy_refcounts && s->qcow_version < 3) {
>          error_setg(errp, "Lazy refcounts require a qcow2 image with at least "
> @@ -751,6 +750,15 @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags,
>      return ret;
>  }
>  
> +static int qcow2_refresh_limits(BlockDriverState *bs)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +
> +    bs->bl.write_zeroes_alignment = s->cluster_sectors;
> +
> +    return 0;
> +}
> +
>  static int qcow2_set_key(BlockDriverState *bs, const char *key)
>  {
>      BDRVQcowState *s = bs->opaque;
> @@ -2268,6 +2276,7 @@ static BlockDriver bdrv_qcow2 = {
>  
>      .bdrv_change_backing_file   = qcow2_change_backing_file,
>  
> +    .bdrv_refresh_limits        = qcow2_refresh_limits,
>      .bdrv_invalidate_cache      = qcow2_invalidate_cache,
>  
>      .create_options = qcow2_create_options,
> diff --git a/block/qed.c b/block/qed.c
> index 450a1fa..d35bd1c 100644
> --- a/block/qed.c
> +++ b/block/qed.c
> @@ -495,7 +495,6 @@ static int bdrv_qed_open(BlockDriverState *bs, QDict *options, int flags,
>          }
>      }
>  
> -    bs->bl.write_zeroes_alignment = s->header.cluster_size >> BDRV_SECTOR_BITS;
>      s->need_check_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
>                                              qed_need_check_timer_cb, s);
>  
> @@ -507,6 +506,15 @@ out:
>      return ret;
>  }
>  
> +static int bdrv_qed_refresh_limits(BlockDriverState *bs)
> +{
> +    BDRVQEDState *s = bs->opaque;
> +
> +    bs->bl.write_zeroes_alignment = s->header.cluster_size >> BDRV_SECTOR_BITS;
> +
> +    return 0;
> +}
> +
>  /* We have nothing to do for QED reopen, stubs just return
>   * success */
>  static int bdrv_qed_reopen_prepare(BDRVReopenState *state,
> @@ -1616,6 +1624,7 @@ static BlockDriver bdrv_qed = {
>      .bdrv_truncate            = bdrv_qed_truncate,
>      .bdrv_getlength           = bdrv_qed_getlength,
>      .bdrv_get_info            = bdrv_qed_get_info,
> +    .bdrv_refresh_limits      = bdrv_qed_refresh_limits,
>      .bdrv_change_backing_file = bdrv_qed_change_backing_file,
>      .bdrv_invalidate_cache    = bdrv_qed_invalidate_cache,
>      .bdrv_check               = bdrv_qed_check,
> diff --git a/block/vmdk.c b/block/vmdk.c
> index 0734bc2..f8a387f 100644
> --- a/block/vmdk.c
> +++ b/block/vmdk.c
> @@ -428,10 +428,6 @@ static int vmdk_add_extent(BlockDriverState *bs,
>      extent->l2_size = l2_size;
>      extent->cluster_sectors = flat ? sectors : cluster_sectors;
>  
> -    if (!flat) {
> -        bs->bl.write_zeroes_alignment =
> -            MAX(bs->bl.write_zeroes_alignment, cluster_sectors);
> -    }
>      if (s->num_extents > 1) {
>          extent->end_sector = (*(extent - 1)).end_sector + extent->sectors;
>      } else {
> @@ -886,6 +882,23 @@ fail:
>      return ret;
>  }
>  
> +
> +static int vmdk_refresh_limits(BlockDriverState *bs)
> +{
> +    BDRVVmdkState *s = bs->opaque;
> +    int i;
> +
> +    for (i = 0; i < s->num_extents; i++) {
> +        if (!s->extents[i].flat) {
> +            bs->bl.write_zeroes_alignment =
> +                MAX(bs->bl.write_zeroes_alignment,
> +                    s->extents[i].cluster_sectors);
> +        }
> +    }
> +
> +    return 0;
> +}
> +
>  static int get_whole_cluster(BlockDriverState *bs,
>                  VmdkExtent *extent,
>                  uint64_t cluster_offset,
> @@ -1971,6 +1984,7 @@ static BlockDriver bdrv_vmdk = {
>      .bdrv_get_allocated_file_size = vmdk_get_allocated_file_size,
>      .bdrv_has_zero_init           = vmdk_has_zero_init,
>      .bdrv_get_specific_info       = vmdk_get_specific_info,
> +    .bdrv_refresh_limits          = vmdk_refresh_limits,
>  
>      .create_options               = vmdk_create_options,
>  };
> diff --git a/include/block/block_int.h b/include/block/block_int.h
> index 8b132d7..c49fa6b 100644
> --- a/include/block/block_int.h
> +++ b/include/block/block_int.h
> @@ -226,6 +226,8 @@ struct BlockDriver {
>      int (*bdrv_debug_resume)(BlockDriverState *bs, const char *tag);
>      bool (*bdrv_debug_is_suspended)(BlockDriverState *bs, const char *tag);
>  
> +    int (*bdrv_refresh_limits)(BlockDriverState *bs);
> +
>      /*
>       * Returns 1 if newly created images are guaranteed to contain only
>       * zeros, 0 otherwise.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 06/22] block: Don't use guest sector size for qemu_blockalign()
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 06/22] block: Don't use guest sector size for qemu_blockalign() Kevin Wolf
@ 2013-12-12  6:32   ` Wenchao Xia
  0 siblings, 0 replies; 39+ messages in thread
From: Wenchao Xia @ 2013-12-12  6:32 UTC (permalink / raw)
  To: Kevin Wolf, qemu-devel; +Cc: pbonzini, pl, stefanha

Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 07/22] block: rename buffer_alignment to guest_block_size
  2013-12-11 21:08 ` [Qemu-devel] [PATCH 07/22] block: rename buffer_alignment to guest_block_size Kevin Wolf
@ 2013-12-12  6:34   ` Wenchao Xia
  0 siblings, 0 replies; 39+ messages in thread
From: Wenchao Xia @ 2013-12-12  6:34 UTC (permalink / raw)
  To: Kevin Wolf, qemu-devel; +Cc: pbonzini, pl, stefanha

Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 01/22] block: Move initialisation of BlockLimits to bdrv_refresh_limits()
  2013-12-12  2:57     ` Wenchao Xia
@ 2013-12-12  7:51       ` Thomas Huth
  2013-12-12  9:32         ` Kevin Wolf
  0 siblings, 1 reply; 39+ messages in thread
From: Thomas Huth @ 2013-12-12  7:51 UTC (permalink / raw)
  To: Wenchao Xia; +Cc: Kevin Wolf, pbonzini, pl, qemu-devel, stefanha

On Thu, 12 Dec 2013 10:57:49 +0800
Wenchao Xia <xiawenc@linux.vnet.ibm.com> wrote:

> 
> >> +static int bdrv_refresh_limits(BlockDriverState *bs)
> >> +{
> >> +    BlockDriver *drv = bs->drv;
> >> +
> >> +    memset(&bs->bl, 0, sizeof(bs->bl));
> >> +
> >> +    if (!drv) {
> >> +        return 0;
> >> +    } else if (drv->bdrv_refresh_limits) {
> >> +        return drv->bdrv_refresh_limits(bs);
> >> +    }
> >> +
> >> +    return 0;
> >      It seems this line can be removed.
> > 
>   I missed the "else if", then the patch is OK.
 
But it could also be written in a shorter way:

    if (drv && drv->bdrv_refresh_limits) {
        return drv->bdrv_refresh_limits(bs);
    }

    return 0;

 Regards,
  Thomas

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 01/22] block: Move initialisation of BlockLimits to bdrv_refresh_limits()
  2013-12-12  7:51       ` Thomas Huth
@ 2013-12-12  9:32         ` Kevin Wolf
  0 siblings, 0 replies; 39+ messages in thread
From: Kevin Wolf @ 2013-12-12  9:32 UTC (permalink / raw)
  To: Thomas Huth; +Cc: pbonzini, pl, Wenchao Xia, stefanha, qemu-devel

Am 12.12.2013 um 08:51 hat Thomas Huth geschrieben:
> On Thu, 12 Dec 2013 10:57:49 +0800
> Wenchao Xia <xiawenc@linux.vnet.ibm.com> wrote:
> 
> > 
> > >> +static int bdrv_refresh_limits(BlockDriverState *bs)
> > >> +{
> > >> +    BlockDriver *drv = bs->drv;
> > >> +
> > >> +    memset(&bs->bl, 0, sizeof(bs->bl));
> > >> +
> > >> +    if (!drv) {
> > >> +        return 0;
> > >> +    } else if (drv->bdrv_refresh_limits) {
> > >> +        return drv->bdrv_refresh_limits(bs);
> > >> +    }
> > >> +
> > >> +    return 0;
> > >      It seems this line can be removed.
> > > 
> >   I missed the "else if", then the patch is OK.
>  
> But it could also be written in a shorter way:
> 
>     if (drv && drv->bdrv_refresh_limits) {
>         return drv->bdrv_refresh_limits(bs);
>     }
> 
>     return 0;

Indeed, with some code changes, this has become a bit more complicated
than necessary. I need to touch the patch anyway for Peter's comments,
so I'll change it, even though it disappears anyway later in the series.

Kevin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 01/22] block: Move initialisation of BlockLimits to bdrv_refresh_limits()
  2013-12-12  6:17   ` Peter Lieven
@ 2013-12-12  9:35     ` Kevin Wolf
  0 siblings, 0 replies; 39+ messages in thread
From: Kevin Wolf @ 2013-12-12  9:35 UTC (permalink / raw)
  To: Peter Lieven; +Cc: pbonzini, qemu-devel, stefanha

Am 12.12.2013 um 07:17 hat Peter Lieven geschrieben:
> Am 11.12.2013 22:08, schrieb Kevin Wolf:
> > This function separates filling the BlockLimits from bdrv_open(), which
> > allows it to call it from other operations which may change the limits
> > (e.g. modifications to the backing file chain or bdrv_reopen)
> >
> > Signed-off-by: Kevin Wolf <kwolf@redhat.com>

> > +static int iscsi_refresh_limits(BlockDriverState *bs)
> > +{
> > +    IscsiLun *iscsilun = bs->opaque;
> > +    struct scsi_task *task = NULL;
> > +    int ret;
> > +
> > +    memset(&bs->bl, 0, sizeof(bs->bl));
> > +
> you do that memset in bdrv_refresh_limits already.

Right, I'll drop it here.

> >      if (iscsilun->lbp.lbpu || iscsilun->lbp.lbpws) {
> >          struct scsi_inquiry_block_limits *inq_bl;
> >          task = iscsi_do_inquiry(iscsilun->iscsi, iscsilun->lun, 1,
> > @@ -1462,48 +1512,14 @@ static int iscsi_open(BlockDriverState *bs, QDict *options, int flags,
> 
> iscsi_do_inquiry does a sync call to request the pages from the target.
> you can only do that here if you can guarantee that iscsi_refresh_limits will
> be called with no pending requests in-flight otherwise this will end in a mess.
> 
> otherwise i would propose to leave the inquiry call ins iscsi_open and just fill
> the bs->bl struc tin iscsi_refresh_limits. the problem is if we make this async
> we have to handle all possible I/O callbacks whiile in iscsi_refresh_limits.
> i do not see how an iscsi target can change its limits while a target is mounted.

Yes, I'll just split the block in two, leaving the actual request in
iscsi_open().

Thanks for your review.

Kevin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation
  2013-12-12  6:09 ` [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation Peter Lieven
@ 2013-12-12  9:47   ` Kevin Wolf
  2013-12-12 10:26     ` Peter Lieven
  2013-12-12 12:48     ` Peter Lieven
  0 siblings, 2 replies; 39+ messages in thread
From: Kevin Wolf @ 2013-12-12  9:47 UTC (permalink / raw)
  To: Peter Lieven; +Cc: pbonzini, qemu-devel, stefanha

Am 12.12.2013 um 07:09 hat Peter Lieven geschrieben:
> Should it be possible to boot from a 4K-native drive with this series?
> If yes, I will run some test with some new iSCSI arrays we got for testing
> they can export 4k blocksize LUNs.

Yes, you should be able to use a 4k-native backend for a 512b-based
guest with this series (at the cost of some performance, of course).

Thanks for your testing offer!

> Anyway, can you please include the patch from Paolo which sets
> the bs->request_alignment = iscsilun->block_size in iscsi_open.
> It should be a one-liner.

Sure, I've included it now. For your test, it's probably easiest if you
pull the updated version from here:

    git://repo.or.cz/qemu/kevin.git align

Do you think you'll run some tests already today? If so, I could wait a
bit with sending out a v2 and see what your first results are.

Kevin

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation
  2013-12-12  9:47   ` Kevin Wolf
@ 2013-12-12 10:26     ` Peter Lieven
  2013-12-12 12:48     ` Peter Lieven
  1 sibling, 0 replies; 39+ messages in thread
From: Peter Lieven @ 2013-12-12 10:26 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: pbonzini, qemu-devel, stefanha

Am 12.12.2013 10:47, schrieb Kevin Wolf:
> Am 12.12.2013 um 07:09 hat Peter Lieven geschrieben:
>> Should it be possible to boot from a 4K-native drive with this series?
>> If yes, I will run some test with some new iSCSI arrays we got for testing
>> they can export 4k blocksize LUNs.
> Yes, you should be able to use a 4k-native backend for a 512b-based
> guest with this series (at the cost of some performance, of course).
>
> Thanks for your testing offer!
>
>> Anyway, can you please include the patch from Paolo which sets
>> the bs->request_alignment = iscsilun->block_size in iscsi_open.
>> It should be a one-liner.
> Sure, I've included it now. For your test, it's probably easiest if you
> pull the updated version from here:
>
>     git://repo.or.cz/qemu/kevin.git align
>
> Do you think you'll run some tests already today? If so, I could wait a
> bit with sending out a v2 and see what your first results are.
I can do some basic testing like installing Linux and Windows on
such a partition and see if it succeeds.

Peter

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation
  2013-12-12  9:47   ` Kevin Wolf
  2013-12-12 10:26     ` Peter Lieven
@ 2013-12-12 12:48     ` Peter Lieven
  1 sibling, 0 replies; 39+ messages in thread
From: Peter Lieven @ 2013-12-12 12:48 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: pbonzini, qemu-devel, stefanha

Am 12.12.2013 10:47, schrieb Kevin Wolf:
> Am 12.12.2013 um 07:09 hat Peter Lieven geschrieben:
>> Should it be possible to boot from a 4K-native drive with this series?
>> If yes, I will run some test with some new iSCSI arrays we got for testing
>> they can export 4k blocksize LUNs.
> Yes, you should be able to use a 4k-native backend for a 512b-based
> guest with this series (at the cost of some performance, of course).
>
> Thanks for your testing offer!
>
>> Anyway, can you please include the patch from Paolo which sets
>> the bs->request_alignment = iscsilun->block_size in iscsi_open.
>> It should be a one-liner.
> Sure, I've included it now. For your test, it's probably easiest if you
> pull the updated version from here:
>
>     git://repo.or.cz/qemu/kevin.git align
>
> Do you think you'll run some tests already today? If so, I could wait a
> bit with sending out a v2 and see what your first results are.
Will take some more time. Found a but in the implementation of my storage
which hit a bug in qemu. Will send a fix for the latter one shortly.

Peter

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2013-12-12 12:48 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-11 21:08 [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation Kevin Wolf
2013-12-11 21:08 ` [Qemu-devel] [PATCH 01/22] block: Move initialisation of BlockLimits to bdrv_refresh_limits() Kevin Wolf
2013-12-12  2:54   ` Wenchao Xia
2013-12-12  2:57     ` Wenchao Xia
2013-12-12  7:51       ` Thomas Huth
2013-12-12  9:32         ` Kevin Wolf
2013-12-12  6:17   ` Peter Lieven
2013-12-12  9:35     ` Kevin Wolf
2013-12-11 21:08 ` [Qemu-devel] [PATCH 02/22] block: Inherit opt_transfer_length Kevin Wolf
2013-12-12  3:02   ` Wenchao Xia
2013-12-11 21:08 ` [Qemu-devel] [PATCH 03/22] block: Update BlockLimits when they might have changed Kevin Wolf
2013-12-12  3:09   ` Wenchao Xia
2013-12-11 21:08 ` [Qemu-devel] [PATCH 04/22] qemu_memalign: Allow small alignments Kevin Wolf
2013-12-12  3:09   ` Wenchao Xia
2013-12-11 21:08 ` [Qemu-devel] [PATCH 05/22] block: Detect unaligned length in bdrv_qiov_is_aligned() Kevin Wolf
2013-12-12  3:15   ` Wenchao Xia
2013-12-11 21:08 ` [Qemu-devel] [PATCH 06/22] block: Don't use guest sector size for qemu_blockalign() Kevin Wolf
2013-12-12  6:32   ` Wenchao Xia
2013-12-11 21:08 ` [Qemu-devel] [PATCH 07/22] block: rename buffer_alignment to guest_block_size Kevin Wolf
2013-12-12  6:34   ` Wenchao Xia
2013-12-11 21:08 ` [Qemu-devel] [PATCH 08/22] raw: Probe required direct I/O alignment Kevin Wolf
2013-12-11 21:08 ` [Qemu-devel] [PATCH 09/22] block: Introduce bdrv_aligned_preadv() Kevin Wolf
2013-12-11 21:08 ` [Qemu-devel] [PATCH 10/22] block: Introduce bdrv_co_do_preadv() Kevin Wolf
2013-12-11 21:08 ` [Qemu-devel] [PATCH 11/22] block: Introduce bdrv_aligned_pwritev() Kevin Wolf
2013-12-11 21:08 ` [Qemu-devel] [PATCH 12/22] block: write: Handle COR dependency after I/O throttling Kevin Wolf
2013-12-11 21:08 ` [Qemu-devel] [PATCH 13/22] block: Introduce bdrv_co_do_pwritev() Kevin Wolf
2013-12-11 21:08 ` [Qemu-devel] [PATCH 14/22] block: Switch BdrvTrackedRequest to byte granularity Kevin Wolf
2013-12-11 21:08 ` [Qemu-devel] [PATCH 15/22] block: Allow waiting for overlapping requests between begin/end Kevin Wolf
2013-12-11 21:08 ` [Qemu-devel] [PATCH 16/22] block: Make zero-after-EOF work with larger alignment Kevin Wolf
2013-12-11 21:08 ` [Qemu-devel] [PATCH 17/22] block: Generalise and optimise COR serialisation Kevin Wolf
2013-12-11 21:08 ` [Qemu-devel] [PATCH 18/22] block: Make overlap range for serialisation dynamic Kevin Wolf
2013-12-11 21:08 ` [Qemu-devel] [PATCH 19/22] block: Align requests in bdrv_co_do_pwritev() Kevin Wolf
2013-12-11 21:08 ` [Qemu-devel] [PATCH 20/22] block: Change coroutine wrapper to byte granularity Kevin Wolf
2013-12-11 21:08 ` [Qemu-devel] [PATCH 21/22] block: Make bdrv_pread() a bdrv_prwv_co() wrapper Kevin Wolf
2013-12-11 21:08 ` [Qemu-devel] [PATCH 22/22] block: Make bdrv_pwrite() " Kevin Wolf
2013-12-12  6:09 ` [Qemu-devel] [PATCH 00/22] block: Support for 512b-on-4k emulation Peter Lieven
2013-12-12  9:47   ` Kevin Wolf
2013-12-12 10:26     ` Peter Lieven
2013-12-12 12:48     ` Peter Lieven

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.