All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 00/18] Support mismatched host and guest logical block sizes
@ 2012-01-26 17:22 Paolo Bonzini
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 01/18] block: do not rely on open_flags for bdrv_is_snapshot Paolo Bonzini
                   ` (17 more replies)
  0 siblings, 18 replies; 19+ messages in thread
From: Paolo Bonzini @ 2012-01-26 17:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

As in v1, the series is logically split in two parts.

The first six patches ensure that BDRV_O_NO_CACHE is set for all non-file
protocols.  This makes sure that large sectors are handled correctly
for the libiscsi backend.

The next patches add support for arbitrary combinations of host/guest
logical block size even for cache=none.  Having guest_block_size >
host_block_size is unsafe in the presence of power failures, while
having guest_block_size < host_block_size is just (possibly) slower.

Physical block size and minimum I/O size are set to the host block
size in order to minimize ill performance behavior.  This is a small
guest-visible change, but only if you were hosting your images on a
4k-sector partition _and_ using cache != none:

- 4k-sectors and cache=none used to fail horribly

- filesystem images with cache != none cannot detect they are on a
  4k-sector disk, because BLKSSZGET fails

This should be rare enough that it's bearable to trade this for better
defaults.

v1->v2: new patch to set min_io_size and please XFS;
    aesthetic changes to patches 2 and 3 suggested by Zhi;
    rebase around copy-on-read changes.

Paolo Bonzini (18):
  block: do not rely on open_flags for bdrv_is_snapshot
  block: store actual flags in bs->open_flags
  block: pass protocol flags up to the format
  block: non-raw protocols never cache
  block: remove enable_write_cache
  block: move flag bits together
  raw: remove the aligned_buf
  block: rename buffer_alignment to guest_block_size
  block: add host_block_size
  raw: probe host_block_size
  iscsi: save host block size
  block: allow waiting only for overlapping writes
  block: allow waiting at arbitrary granularity
  block: protect against "torn reads" for guest_block_size > host_block_size
  block: align and serialize I/O when guest_block_size < host_block_size
  block: default physical block size to host block size
  qemu-io: add blocksize argument to open
  block: default min_io_size to host block size when doing rmw

 Makefile.objs     |    4 +-
 block.c           |  336 ++++++++++++++++++++++++++++++++++++++++++++++-------
 block.h           |   18 +---
 block/curl.c      |    1 +
 block/iscsi.c     |    3 +
 block/nbd.c       |    1 +
 block/raw-posix.c |   97 +++++++++------
 block/raw-win32.c |   42 +++++++
 block/rbd.c       |    1 +
 block/sheepdog.c  |    1 +
 block/vdi.c       |    1 +
 block_int.h       |   25 ++--
 hw/ide/core.c     |    2 +-
 hw/scsi-disk.c    |    4 +-
 hw/scsi-generic.c |    2 +-
 hw/virtio-blk.c   |    4 +-
 qemu-io.c         |   33 +++++-
 trace-events      |    1 +
 18 files changed, 452 insertions(+), 124 deletions(-)

-- 
1.7.7.6

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 01/18] block: do not rely on open_flags for bdrv_is_snapshot
  2012-01-26 17:22 [Qemu-devel] [PATCH v2 00/18] Support mismatched host and guest logical block sizes Paolo Bonzini
@ 2012-01-26 17:22 ` Paolo Bonzini
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 02/18] block: store actual flags in bs->open_flags Paolo Bonzini
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2012-01-26 17:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

The BDRV_O_SNAPSHOT flag is stored in open_flags but not passed to
bdrv_open.  This makes the usage of bs->open_flags wrong in
bdrv_snapshot_goto.

(Instead, bdrv_commit uses the backing file's open_flags and those
flags never include any of BDRV_O_SNAPSHOT, BDRV_O_NO_BACKING
or BDRV_O_RDWR).

We will fix the open_flags soon.  In the meanwhile, do not rely
on the fact that BDRV_O_SNAPSHOT is stored in open_flags.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/block.c b/block.c
index 3621d11..c5a85a2 100644
--- a/block.c
+++ b/block.c
@@ -2494,7 +2494,7 @@ int bdrv_can_snapshot(BlockDriverState *bs)
 
 int bdrv_is_snapshot(BlockDriverState *bs)
 {
-    return !!(bs->open_flags & BDRV_O_SNAPSHOT);
+    return bs->is_temporary;
 }
 
 BlockDriverState *bdrv_snapshots(void)
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 02/18] block: store actual flags in bs->open_flags
  2012-01-26 17:22 [Qemu-devel] [PATCH v2 00/18] Support mismatched host and guest logical block sizes Paolo Bonzini
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 01/18] block: do not rely on open_flags for bdrv_is_snapshot Paolo Bonzini
@ 2012-01-26 17:22 ` Paolo Bonzini
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 03/18] block: pass protocol flags up to the format Paolo Bonzini
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2012-01-26 17:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

The passed flags are changed slightly before passing them to
bdrv_open.  Store the same flags in bs->open_flags, so that
they are used correctly in bdrv.  In addition, this way we
will be able to query them and get back consistent values.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c |   38 ++++++++++++++++++--------------------
 1 files changed, 18 insertions(+), 20 deletions(-)

diff --git a/block.c b/block.c
index c5a85a2..a1d2433 100644
--- a/block.c
+++ b/block.c
@@ -565,18 +565,30 @@ void bdrv_disable_copy_on_read(BlockDriverState *bs)
 static int bdrv_open_common(BlockDriverState *bs, const char *filename,
     int flags, BlockDriver *drv)
 {
-    int ret, open_flags;
+    int ret;
 
     assert(drv != NULL);
 
     trace_bdrv_open_common(bs, filename, flags, drv->format_name);
 
+    /*
+     * Clear flags that are internal to the block layer before opening the
+     * image.
+     */
+    bs->open_flags = flags & ~(BDRV_O_SNAPSHOT | BDRV_O_NO_BACKING);
+
+    /*
+     * Snapshots should be writable.
+     */
+    if (bs->is_temporary) {
+        bs->open_flags |= BDRV_O_RDWR;
+    }
+
     bs->file = NULL;
     bs->total_sectors = 0;
     bs->encrypted = 0;
     bs->valid_key = 0;
     bs->sg = 0;
-    bs->open_flags = flags;
     bs->growable = 0;
     bs->buffer_alignment = 512;
 
@@ -596,29 +608,15 @@ static int bdrv_open_common(BlockDriverState *bs, const char *filename,
     bs->opaque = g_malloc0(drv->instance_size);
 
     bs->enable_write_cache = !!(flags & BDRV_O_CACHE_WB);
-
-    /*
-     * Clear flags that are internal to the block layer before opening the
-     * image.
-     */
-    open_flags = flags & ~(BDRV_O_SNAPSHOT | BDRV_O_NO_BACKING);
-
-    /*
-     * Snapshots should be writable.
-     */
-    if (bs->is_temporary) {
-        open_flags |= BDRV_O_RDWR;
-    }
-
-    bs->keep_read_only = bs->read_only = !(open_flags & BDRV_O_RDWR);
+    bs->keep_read_only = bs->read_only = !(bs->open_flags & BDRV_O_RDWR);
 
     /* Open the image, either directly or using a protocol */
     if (drv->bdrv_file_open) {
-        ret = drv->bdrv_file_open(bs, filename, open_flags);
+        ret = drv->bdrv_file_open(bs, filename, bs->open_flags);
     } else {
-        ret = bdrv_file_open(&bs->file, filename, open_flags);
+        ret = bdrv_file_open(&bs->file, filename, bs->open_flags);
         if (ret >= 0) {
-            ret = drv->bdrv_open(bs, open_flags);
+            ret = drv->bdrv_open(bs, bs->open_flags);
         }
     }
 
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 03/18] block: pass protocol flags up to the format
  2012-01-26 17:22 [Qemu-devel] [PATCH v2 00/18] Support mismatched host and guest logical block sizes Paolo Bonzini
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 01/18] block: do not rely on open_flags for bdrv_is_snapshot Paolo Bonzini
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 02/18] block: store actual flags in bs->open_flags Paolo Bonzini
@ 2012-01-26 17:22 ` Paolo Bonzini
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 04/18] block: non-raw protocols never cache Paolo Bonzini
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2012-01-26 17:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

In the next patches, the protocols will modify bs->open_flags to signify
that they cannot support the exact requested feature set.  Pass the
modified flags to the format.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index a1d2433..5055975 100644
--- a/block.c
+++ b/block.c
@@ -616,6 +616,7 @@ static int bdrv_open_common(BlockDriverState *bs, const char *filename,
     } else {
         ret = bdrv_file_open(&bs->file, filename, bs->open_flags);
         if (ret >= 0) {
+            bs->open_flags = bs->file->open_flags;
             ret = drv->bdrv_open(bs, bs->open_flags);
         }
     }
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 04/18] block: non-raw protocols never cache
  2012-01-26 17:22 [Qemu-devel] [PATCH v2 00/18] Support mismatched host and guest logical block sizes Paolo Bonzini
                   ` (2 preceding siblings ...)
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 03/18] block: pass protocol flags up to the format Paolo Bonzini
@ 2012-01-26 17:22 ` Paolo Bonzini
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 05/18] block: remove enable_write_cache Paolo Bonzini
                   ` (13 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2012-01-26 17:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

Non-raw protocols never cache their data.  Make this visible by setting
BDRV_O_NOCACHE in the open_flags.  It will be used to prohibit block sizes
smaller than the backend's block size.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/curl.c     |    1 +
 block/iscsi.c    |    2 ++
 block/nbd.c      |    1 +
 block/rbd.c      |    1 +
 block/sheepdog.c |    1 +
 block/vdi.c      |    1 +
 6 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/block/curl.c b/block/curl.c
index e9102e3..e7243dd 100644
--- a/block/curl.c
+++ b/block/curl.c
@@ -409,6 +409,7 @@ static int curl_open(BlockDriverState *bs, const char *filename, int flags)
     curl_multi_setopt( s->multi, CURLMOPT_SOCKETFUNCTION, curl_sock_cb ); 
     curl_multi_do(s);
 
+    bs->open_flags |= BDRV_O_NOCACHE;
     return 0;
 
 out:
diff --git a/block/iscsi.c b/block/iscsi.c
index bd3ca11..55f52c4 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -578,6 +578,8 @@ static int iscsi_open(BlockDriverState *bs, const char *filename, int flags)
         return -EINVAL;
     }
 
+    bs->open_flags |= BDRV_O_NOCACHE;
+
     iscsi_url = iscsi_parse_full_url(iscsi, filename);
     if (iscsi_url == NULL) {
         error_report("Failed to parse URL : %s %s", filename,
diff --git a/block/nbd.c b/block/nbd.c
index 161b299..ca2ccca 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -317,6 +317,7 @@ static int nbd_open(BlockDriverState *bs, const char* filename, int flags)
      */
     result = nbd_establish_connection(bs);
 
+    bs->open_flags |= BDRV_O_NOCACHE;
     return result;
 }
 
diff --git a/block/rbd.c b/block/rbd.c
index 46a8579..e6c8f15 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -494,6 +494,7 @@ static int qemu_rbd_open(BlockDriverState *bs, const char *filename, int flags)
     }
 
     bs->read_only = (s->snap != NULL);
+    bs->open_flags |= BDRV_O_NOCACHE;
 
     s->event_reader_pos = 0;
     r = qemu_pipe(s->fds);
diff --git a/block/sheepdog.c b/block/sheepdog.c
index 9416400..accee00 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -1047,6 +1047,7 @@ static int sd_open(BlockDriverState *bs, const char *filename, int flags)
     s->min_dirty_data_idx = UINT32_MAX;
     s->max_dirty_data_idx = 0;
 
+    bs->open_flags |= BDRV_O_NOCACHE;
     bs->total_sectors = s->inode.vdi_size / SECTOR_SIZE;
     strncpy(s->name, vdi, sizeof(s->name));
     qemu_co_mutex_init(&s->lock);
diff --git a/block/vdi.c b/block/vdi.c
index 6a0011f..fde612f 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -441,6 +441,7 @@ static int vdi_open(BlockDriverState *bs, int flags)
         goto fail;
     }
 
+    bs->open_flags |= BDRV_O_NOCACHE;
     bs->total_sectors = header.disk_size / SECTOR_SIZE;
 
     s->block_size = header.block_size;
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 05/18] block: remove enable_write_cache
  2012-01-26 17:22 [Qemu-devel] [PATCH v2 00/18] Support mismatched host and guest logical block sizes Paolo Bonzini
                   ` (3 preceding siblings ...)
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 04/18] block: non-raw protocols never cache Paolo Bonzini
@ 2012-01-26 17:22 ` Paolo Bonzini
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 06/18] block: move flag bits together Paolo Bonzini
                   ` (12 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2012-01-26 17:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

This duplicates a bit in open_flags, we do not need it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c     |    3 +--
 block_int.h |    3 ---
 2 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/block.c b/block.c
index 5055975..afc327e 100644
--- a/block.c
+++ b/block.c
@@ -607,7 +607,6 @@ static int bdrv_open_common(BlockDriverState *bs, const char *filename,
     bs->drv = drv;
     bs->opaque = g_malloc0(drv->instance_size);
 
-    bs->enable_write_cache = !!(flags & BDRV_O_CACHE_WB);
     bs->keep_read_only = bs->read_only = !(bs->open_flags & BDRV_O_RDWR);
 
     /* Open the image, either directly or using a protocol */
@@ -2040,7 +2039,7 @@ int bdrv_is_sg(BlockDriverState *bs)
 
 int bdrv_enable_write_cache(BlockDriverState *bs)
 {
-    return bs->enable_write_cache;
+    return !!(bs->open_flags & BDRV_O_CACHE_WB);
 }
 
 int bdrv_is_encrypted(BlockDriverState *bs)
diff --git a/block_int.h b/block_int.h
index 7be2988..0bbebe4 100644
--- a/block_int.h
+++ b/block_int.h
@@ -277,9 +277,6 @@ struct BlockDriverState {
     /* the memory alignment required for the buffers handled by this driver */
     int buffer_alignment;
 
-    /* do we need to tell the quest if we have a volatile write cache? */
-    int enable_write_cache;
-
     /* NOTE: the following infos are only hints for real hardware
        drivers. They are not used by the block driver */
     int cyls, heads, secs, translation;
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 06/18] block: move flag bits together
  2012-01-26 17:22 [Qemu-devel] [PATCH v2 00/18] Support mismatched host and guest logical block sizes Paolo Bonzini
                   ` (4 preceding siblings ...)
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 05/18] block: remove enable_write_cache Paolo Bonzini
@ 2012-01-26 17:22 ` Paolo Bonzini
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 07/18] raw: remove the aligned_buf Paolo Bonzini
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2012-01-26 17:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block_int.h |   17 ++++++++---------
 1 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/block_int.h b/block_int.h
index 0bbebe4..6f23d4a 100644
--- a/block_int.h
+++ b/block_int.h
@@ -222,15 +222,18 @@ struct BlockDriver {
 struct BlockDriverState {
     int64_t total_sectors; /* if we are reading a disk image, give its
                               size in sectors */
-    int read_only; /* if true, the media is read only */
-    int keep_read_only; /* if true, the media was requested to stay read only */
     int open_flags; /* flags used to open the file, re-used for re-open */
-    int encrypted; /* if true, the media is encrypted */
-    int valid_key; /* if true, a valid encryption key has been set */
-    int sg;        /* if true, the device is a /dev/sg* */
     int copy_on_read; /* if true, copy read backing sectors into image
                          note this is a reference count */
 
+    unsigned read_only:1; /* if true, the media is read only */
+    unsigned keep_read_only:1; /* if true, the media was requested to stay read only */
+    unsigned encrypted:1; /* if true, the media is encrypted */
+    unsigned valid_key:1; /* if true, a valid encryption key has been set */
+    unsigned sg:1;        /* if true, the device is a /dev/sg* */
+    unsigned growable:1;  /* if true, the disk can expand beyond total_sectors */
+    unsigned is_temporary:1;   /* if true, the disk was created from a snapshot */
+
     BlockDriver *drv; /* NULL means no media */
     void *opaque;
 
@@ -243,7 +246,6 @@ struct BlockDriverState {
     char backing_file[1024]; /* if non zero, the image is a diff of
                                 this file image */
     char backing_format[16]; /* if non-zero and backing_file exists */
-    int is_temporary;
 
     BlockDriverState *backing_hd;
     BlockDriverState *file;
@@ -271,9 +273,6 @@ struct BlockDriverState {
     uint64_t total_time_ns[BDRV_MAX_IOTYPE];
     uint64_t wr_highest_sector;
 
-    /* Whether the disk can expand beyond total_sectors */
-    int growable;
-
     /* the memory alignment required for the buffers handled by this driver */
     int buffer_alignment;
 
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 07/18] raw: remove the aligned_buf
  2012-01-26 17:22 [Qemu-devel] [PATCH v2 00/18] Support mismatched host and guest logical block sizes Paolo Bonzini
                   ` (5 preceding siblings ...)
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 06/18] block: move flag bits together Paolo Bonzini
@ 2012-01-26 17:22 ` Paolo Bonzini
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 08/18] block: rename buffer_alignment to guest_block_size Paolo Bonzini
                   ` (10 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2012-01-26 17:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/raw-posix.c |   25 +++----------------------
 1 files changed, 3 insertions(+), 22 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 2ee5d69..007d1d3 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -129,8 +129,6 @@ typedef struct BDRVRawState {
     int use_aio;
     void *aio_ctx;
 #endif
-    uint8_t *aligned_buf;
-    unsigned aligned_buf_size;
 #ifdef CONFIG_XFS
     bool is_xfs : 1;
 #endif
@@ -216,23 +214,10 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
         return ret;
     }
     s->fd = fd;
-    s->aligned_buf = NULL;
-
-    if ((bdrv_flags & BDRV_O_NOCACHE)) {
-        /*
-         * Allocate a buffer for read/modify/write cycles.  Chose the size
-         * pessimistically as we don't know the block size yet.
-         */
-        s->aligned_buf_size = 32 * MAX_BLOCKSIZE;
-        s->aligned_buf = qemu_memalign(MAX_BLOCKSIZE, s->aligned_buf_size);
-        if (s->aligned_buf == NULL) {
-            goto out_close;
-        }
-    }
 
     /* We're falling back to POSIX AIO in some cases so init always */
     if (paio_init() < 0) {
-        goto out_free_buf;
+        goto out_close;
     }
 
 #ifdef CONFIG_LINUX_AIO
@@ -245,7 +230,7 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
 
         s->aio_ctx = laio_init();
         if (!s->aio_ctx) {
-            goto out_free_buf;
+            goto out_close;
         }
         s->use_aio = 1;
     } else
@@ -264,8 +249,6 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
 
     return 0;
 
-out_free_buf:
-    qemu_vfree(s->aligned_buf);
 out_close:
     close(fd);
     return -errno;
@@ -326,7 +309,7 @@ static BlockDriverAIOCB *raw_aio_submit(BlockDriverState *bs,
      * boundary.  Check if this is the case or tell the low-level
      * driver that it needs to copy the buffer.
      */
-    if (s->aligned_buf) {
+    if ((bs->open_flags & BDRV_O_NOCACHE)) {
         if (!qiov_is_aligned(bs, qiov)) {
             type |= QEMU_AIO_MISALIGNED;
 #ifdef CONFIG_LINUX_AIO
@@ -374,8 +357,6 @@ static void raw_close(BlockDriverState *bs)
     if (s->fd >= 0) {
         close(s->fd);
         s->fd = -1;
-        if (s->aligned_buf != NULL)
-            qemu_vfree(s->aligned_buf);
     }
 }
 
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 08/18] block: rename buffer_alignment to guest_block_size
  2012-01-26 17:22 [Qemu-devel] [PATCH v2 00/18] Support mismatched host and guest logical block sizes Paolo Bonzini
                   ` (6 preceding siblings ...)
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 07/18] raw: remove the aligned_buf Paolo Bonzini
@ 2012-01-26 17:22 ` Paolo Bonzini
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 09/18] block: add host_block_size Paolo Bonzini
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2012-01-26 17:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

The alignment field is now set to the value that is promised to the guest,
rather than required by the host.  The next patches will make QEMU aware
of the host-provided values, so make this clear.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c           |   10 +++++-----
 block.h           |    2 +-
 block/raw-posix.c |    2 +-
 block_int.h       |    4 ++--
 hw/ide/core.c     |    2 +-
 hw/scsi-disk.c    |    2 +-
 hw/scsi-generic.c |    2 +-
 hw/virtio-blk.c   |    2 +-
 8 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/block.c b/block.c
index afc327e..2e1ebeb 100644
--- a/block.c
+++ b/block.c
@@ -590,7 +590,7 @@ static int bdrv_open_common(BlockDriverState *bs, const char *filename,
     bs->valid_key = 0;
     bs->sg = 0;
     bs->growable = 0;
-    bs->buffer_alignment = 512;
+    bs->guest_block_size = 512;
 
     assert(bs->copy_on_read == 0); /* bdrv_new() and bdrv_close() make it so */
     if ((flags & BDRV_O_RDWR) && (flags & BDRV_O_COPY_ON_READ)) {
@@ -920,7 +920,7 @@ void bdrv_detach_dev(BlockDriverState *bs, void *dev)
     bs->dev = NULL;
     bs->dev_ops = NULL;
     bs->dev_opaque = NULL;
-    bs->buffer_alignment = 512;
+    bs->guest_block_size = 512;
 }
 
 /* TODO change to return DeviceState * when all users are qdevified */
@@ -3604,14 +3604,14 @@ BlockDriverAIOCB *bdrv_aio_ioctl(BlockDriverState *bs,
     return NULL;
 }
 
-void bdrv_set_buffer_alignment(BlockDriverState *bs, int align)
+void bdrv_set_guest_block_size(BlockDriverState *bs, int align)
 {
-    bs->buffer_alignment = align;
+    bs->guest_block_size = align;
 }
 
 void *qemu_blockalign(BlockDriverState *bs, size_t size)
 {
-    return qemu_memalign((bs && bs->buffer_alignment) ? bs->buffer_alignment : 512, size);
+    return qemu_memalign((bs && bs->guest_block_size) ? bs->guest_block_size : 512, size);
 }
 
 void bdrv_set_dirty_tracking(BlockDriverState *bs, int enable)
diff --git a/block.h b/block.h
index cae289b..34dc925 100644
--- a/block.h
+++ b/block.h
@@ -307,7 +307,7 @@ int bdrv_img_create(const char *filename, const char *fmt,
                     const char *base_filename, const char *base_fmt,
                     char *options, uint64_t img_size, int flags);
 
-void bdrv_set_buffer_alignment(BlockDriverState *bs, int align);
+void bdrv_set_guest_block_size(BlockDriverState *bs, int align);
 void *qemu_blockalign(BlockDriverState *bs, size_t size);
 
 #define BDRV_SECTORS_PER_DIRTY_CHUNK 2048
diff --git a/block/raw-posix.c b/block/raw-posix.c
index 007d1d3..49a8c21 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -287,7 +287,7 @@ static int qiov_is_aligned(BlockDriverState *bs, QEMUIOVector *qiov)
     int i;
 
     for (i = 0; i < qiov->niov; i++) {
-        if ((uintptr_t) qiov->iov[i].iov_base % bs->buffer_alignment) {
+        if ((uintptr_t) qiov->iov[i].iov_base % bs->guest_block_size) {
             return 0;
         }
     }
diff --git a/block_int.h b/block_int.h
index 6f23d4a..dfde061 100644
--- a/block_int.h
+++ b/block_int.h
@@ -273,8 +273,8 @@ struct BlockDriverState {
     uint64_t total_time_ns[BDRV_MAX_IOTYPE];
     uint64_t wr_highest_sector;
 
-    /* the memory alignment required for the buffers handled by this driver */
-    int buffer_alignment;
+    /* the memory alignment used by the guest for the buffers handled by this driver */
+    int guest_block_size;
 
     /* NOTE: the following infos are only hints for real hardware
        drivers. They are not used by the block driver */
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 56b219b..9eefdf2 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -1852,7 +1852,7 @@ int ide_init_drive(IDEState *s, BlockDriverState *bs, IDEDriveKind kind,
     s->smart_selftest_count = 0;
     if (kind == IDE_CD) {
         bdrv_set_dev_ops(bs, &ide_cd_block_ops, s);
-        bdrv_set_buffer_alignment(bs, 2048);
+        bdrv_set_guest_block_size(bs, 2048);
     } else {
         if (!bdrv_is_inserted(s->bs)) {
             error_report("Device needs media, but drive is empty");
diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index 11cfe73..4f370a6 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -1536,7 +1536,7 @@ static int scsi_initfn(SCSIDevice *dev)
     if (s->removable) {
         bdrv_set_dev_ops(s->qdev.conf.bs, &scsi_cd_block_ops, s);
     }
-    bdrv_set_buffer_alignment(s->qdev.conf.bs, s->qdev.blocksize);
+    bdrv_set_guest_block_size(s->qdev.conf.bs, s->qdev.blocksize);
 
     bdrv_iostatus_enable(s->qdev.conf.bs);
     add_boot_device_path(s->qdev.conf.bootindex, &dev->qdev, NULL);
diff --git a/hw/scsi-generic.c b/hw/scsi-generic.c
index 0aebcdd..95d3cda 100644
--- a/hw/scsi-generic.c
+++ b/hw/scsi-generic.c
@@ -184,7 +184,7 @@ static void scsi_read_complete(void * opaque, int ret)
             s->blocksize = ldl_be_p(&r->buf[8]);
             s->max_lba = ldq_be_p(&r->buf[0]);
         }
-        bdrv_set_buffer_alignment(s->conf.bs, s->blocksize);
+        bdrv_set_guest_block_size(s->conf.bs, s->blocksize);
 
         scsi_req_data(&r->req, len);
         if (!r->req.io_canceled) {
diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index a5a4396..7c44a1f 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -614,7 +614,7 @@ VirtIODevice *virtio_blk_init(DeviceState *dev, BlockConf *conf,
     register_savevm(dev, "virtio-blk", virtio_blk_id++, 2,
                     virtio_blk_save, virtio_blk_load, s);
     bdrv_set_dev_ops(s->bs, &virtio_block_ops, s);
-    bdrv_set_buffer_alignment(s->bs, conf->logical_block_size);
+    bdrv_set_guest_block_size(s->bs, conf->logical_block_size);
 
     bdrv_iostatus_enable(s->bs);
     add_boot_device_path(conf->bootindex, dev, "/disk@0,0");
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 09/18] block: add host_block_size
  2012-01-26 17:22 [Qemu-devel] [PATCH v2 00/18] Support mismatched host and guest logical block sizes Paolo Bonzini
                   ` (7 preceding siblings ...)
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 08/18] block: rename buffer_alignment to guest_block_size Paolo Bonzini
@ 2012-01-26 17:22 ` Paolo Bonzini
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 10/18] raw: probe host_block_size Paolo Bonzini
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2012-01-26 17:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c     |    4 +++-
 block_int.h |    3 +++
 2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/block.c b/block.c
index 2e1ebeb..af41fb3 100644
--- a/block.c
+++ b/block.c
@@ -591,6 +591,7 @@ static int bdrv_open_common(BlockDriverState *bs, const char *filename,
     bs->sg = 0;
     bs->growable = 0;
     bs->guest_block_size = 512;
+    bs->host_block_size = 512;
 
     assert(bs->copy_on_read == 0); /* bdrv_new() and bdrv_close() make it so */
     if ((flags & BDRV_O_RDWR) && (flags & BDRV_O_COPY_ON_READ)) {
@@ -616,6 +617,7 @@ static int bdrv_open_common(BlockDriverState *bs, const char *filename,
         ret = bdrv_file_open(&bs->file, filename, bs->open_flags);
         if (ret >= 0) {
             bs->open_flags = bs->file->open_flags;
+            bs->host_block_size = bs->file->host_block_size;
             ret = drv->bdrv_open(bs, bs->open_flags);
         }
     }
@@ -3611,7 +3613,7 @@ void bdrv_set_guest_block_size(BlockDriverState *bs, int align)
 
 void *qemu_blockalign(BlockDriverState *bs, size_t size)
 {
-    return qemu_memalign((bs && bs->guest_block_size) ? bs->guest_block_size : 512, size);
+    return qemu_memalign(bs ? bs->host_block_size : 512, size);
 }
 
 void bdrv_set_dirty_tracking(BlockDriverState *bs, int enable)
diff --git a/block_int.h b/block_int.h
index dfde061..06e5d35 100644
--- a/block_int.h
+++ b/block_int.h
@@ -276,6 +276,9 @@ struct BlockDriverState {
     /* the memory alignment used by the guest for the buffers handled by this driver */
     int guest_block_size;
 
+    /* the memory alignment required by the device for the buffers handled by this driver */
+    int host_block_size;
+
     /* NOTE: the following infos are only hints for real hardware
        drivers. They are not used by the block driver */
     int cyls, heads, secs, translation;
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 10/18] raw: probe host_block_size
  2012-01-26 17:22 [Qemu-devel] [PATCH v2 00/18] Support mismatched host and guest logical block sizes Paolo Bonzini
                   ` (8 preceding siblings ...)
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 09/18] block: add host_block_size Paolo Bonzini
@ 2012-01-26 17:22 ` Paolo Bonzini
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 11/18] iscsi: save host block size Paolo Bonzini
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2012-01-26 17:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

Use ioctls if possible, else see what alignment it takes for O_DIRECT
to succeed.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/raw-posix.c |   72 ++++++++++++++++++++++++++++++++++++++++------------
 block/raw-win32.c |   42 +++++++++++++++++++++++++++++++
 2 files changed, 97 insertions(+), 17 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index 49a8c21..3537394 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -52,6 +52,7 @@
 #include <sys/param.h>
 #include <linux/cdrom.h>
 #include <linux/fd.h>
+#include <linux/fs.h>
 #endif
 #if defined (__FreeBSD__) || defined(__FreeBSD_kernel__)
 #include <sys/disk.h>
@@ -179,6 +180,58 @@ static int raw_normalize_devicepath(const char **filename)
 }
 #endif
 
+static void raw_probe_alignment(BlockDriverState *bs)
+{
+    BDRVRawState *s = bs->opaque;
+    char *buf;
+    unsigned int sector_size;
+
+    /* For /dev/sg devices the alignment is not really used.  */
+    if (bs->sg) {
+        return;
+    }
+
+    /* For block devices, try to get the actual sector size even if we
+     * do not need it, so that it can be passed down to the guest.
+     */
+#ifdef BLKSSZGET
+    if (ioctl(s->fd, BLKSSZGET, &sector_size) >= 0) {
+        bs->host_block_size = sector_size;
+    }
+#endif
+#ifdef DKIOCGETBLOCKSIZE
+    if (ioctl(s->fd, DKIOCGETBLOCKSIZE, &sector_size) >= 0) {
+        bs->host_block_size = sector_size;
+    }
+#endif
+#ifdef DIOCGSECTORSIZE
+    if (ioctl(s->fd, DIOCGSECTORSIZE, &sector_size) >= 0) {
+        bs->host_block_size = sector_size;
+    }
+#endif
+
+    /* If we could not get the size so far, we can only guess it if the file
+     * was opened with O_DIRECT.  Using the minimal value (512) is okay.
+     * It may or may not be safe if the guest logical block size is >512;
+     * however, we will print a scary message suggesting usage of cache=none.
+     * If they hear our advice, the host block size will be detected correctly
+     * and the scary message will go away.
+     */
+    if (!(bs->open_flags & BDRV_O_NOCACHE)) {
+        return;
+    }
+
+    buf = qemu_memalign(MAX_BLOCKSIZE, MAX_BLOCKSIZE);
+    for (sector_size = 512; sector_size < MAX_BLOCKSIZE; sector_size <<= 1) {
+        /* The buffer must be aligned to sector_size, but not sector_size*2.  */
+        if (pread(s->fd, buf + sector_size, sector_size, 0) >= 0) {
+            break;
+        }
+    }
+    bs->host_block_size = sector_size;
+    qemu_vfree(buf);
+}
+
 static int raw_open_common(BlockDriverState *bs, const char *filename,
                            int bdrv_flags, int open_flags)
 {
@@ -214,6 +267,7 @@ static int raw_open_common(BlockDriverState *bs, const char *filename,
         return ret;
     }
     s->fd = fd;
+    raw_probe_alignment(bs);
 
     /* We're falling back to POSIX AIO in some cases so init always */
     if (paio_init() < 0) {
@@ -262,22 +316,6 @@ static int raw_open(BlockDriverState *bs, const char *filename, int flags)
     return raw_open_common(bs, filename, flags, 0);
 }
 
-/* XXX: use host sector size if necessary with:
-#ifdef DIOCGSECTORSIZE
-        {
-            unsigned int sectorsize = 512;
-            if (!ioctl(fd, DIOCGSECTORSIZE, &sectorsize) &&
-                sectorsize > bufsize)
-                bufsize = sectorsize;
-        }
-#endif
-#ifdef CONFIG_COCOA
-        uint32_t blockSize = 512;
-        if ( !ioctl( fd, DKIOCGETBLOCKSIZE, &blockSize ) && blockSize > bufsize) {
-            bufsize = blockSize;
-        }
-#endif
-*/
 
 /*
  * Check if all memory in this vector is sector aligned.
@@ -287,7 +325,7 @@ static int qiov_is_aligned(BlockDriverState *bs, QEMUIOVector *qiov)
     int i;
 
     for (i = 0; i < qiov->niov; i++) {
-        if ((uintptr_t) qiov->iov[i].iov_base % bs->guest_block_size) {
+        if ((uintptr_t) qiov->iov[i].iov_base % bs->host_block_size) {
             return 0;
         }
     }
diff --git a/block/raw-win32.c b/block/raw-win32.c
index e4b0b75..d8b76de 100644
--- a/block/raw-win32.c
+++ b/block/raw-win32.c
@@ -77,6 +77,35 @@ static int set_sparse(int fd)
 				 NULL, 0, NULL, 0, &returned, NULL);
 }
 
+static void raw_probe_alignment(BlockDriverState *bs)
+{
+    BDRVRawState *s = bs->opaque;
+    DWORD sectorsPerCluster, freeClusters, totalClusters, count;
+    DISK_GEOMETRY_EX dg;
+    BOOL status;
+
+    if (s->type == FTYPE_CD) {
+        bs->host_block_size = 2048;
+        return;
+    }
+    if (s->type == FTYPE_HARDDISK) {
+        status = DeviceIoControl(s->hfile, IOCTL_DISK_GET_DRIVE_GEOMETRY_EX,
+                                 NULL, 0, &dg, sizeof(dg), &count, NULL);
+        if (status != 0) {
+            bs->host_block_size = dg.Geometry.BytesPerSector;
+            return;
+        }
+        /* try GetDiskFreeSpace too */
+    }
+
+    if (s->drive_path[0]) {
+        GetDiskFreeSpace(s->drive_path, &sectorsPerCluster,
+                         &dg.Geometry.BytesPerSector,
+                         &freeClusters, &totalClusters);
+        bs->host_block_size = dg.Geometry.BytesPerSector;
+    }
+}
+
 static int raw_open(BlockDriverState *bs, const char *filename, int flags)
 {
     BDRVRawState *s = bs->opaque;
@@ -96,6 +125,18 @@ static int raw_open(BlockDriverState *bs, const char *filename, int flags)
         overlapped |= FILE_FLAG_NO_BUFFERING;
     if (!(flags & BDRV_O_CACHE_WB))
         overlapped |= FILE_FLAG_WRITE_THROUGH;
+
+    if (filename[0] && filename[1] == ':') {
+        snprintf(s->drive_path, sizeof(s->drive_path), "%c:\\", filename[0]);
+    } else if (filename[0] == '\\' && filename[1] == '\\') {
+        s->drive_path[0] = 0;
+    } else {
+        /* Relative path.  */
+        char buf[MAX_PATH];
+        GetCurrentDirectory(MAX_PATH, buf);
+        snprintf(s->drive_path, sizeof(s->drive_path), "%c:\\", buf[0]);
+    }
+
     s->hfile = CreateFile(filename, access_flags,
                           FILE_SHARE_READ, NULL,
                           OPEN_EXISTING, overlapped, NULL);
@@ -106,6 +147,7 @@ static int raw_open(BlockDriverState *bs, const char *filename, int flags)
             return -EACCES;
         return -1;
     }
+    raw_probe_alignment(bs);
     return 0;
 }
 
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 11/18] iscsi: save host block size
  2012-01-26 17:22 [Qemu-devel] [PATCH v2 00/18] Support mismatched host and guest logical block sizes Paolo Bonzini
                   ` (9 preceding siblings ...)
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 10/18] raw: probe host_block_size Paolo Bonzini
@ 2012-01-26 17:22 ` Paolo Bonzini
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 12/18] block: allow waiting only for overlapping writes Paolo Bonzini
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2012-01-26 17:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

The iSCSI backend already gets the block size from the READ CAPACITY
command it sends.  Save it so that the generic block layer gets it
too.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/iscsi.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/block/iscsi.c b/block/iscsi.c
index 55f52c4..cc7a379 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -663,6 +663,7 @@ static int iscsi_open(BlockDriverState *bs, const char *filename, int flags)
     if (iscsi_url != NULL) {
         iscsi_destroy_url(iscsi_url);
     }
+    bs->host_block_size = iscsilun->block_size;
     return 0;
 
 failed:
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 12/18] block: allow waiting only for overlapping writes
  2012-01-26 17:22 [Qemu-devel] [PATCH v2 00/18] Support mismatched host and guest logical block sizes Paolo Bonzini
                   ` (10 preceding siblings ...)
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 11/18] iscsi: save host block size Paolo Bonzini
@ 2012-01-26 17:22 ` Paolo Bonzini
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 13/18] block: allow waiting at arbitrary granularity Paolo Bonzini
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2012-01-26 17:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

To implement mismatching block size, we will reuse the request tracking
mechanism that is used for copy-on-read.  However, waiting for overlapping
reads is not needed to protect against "torn reads", so add a flag to
wait_for_overlapping_requests.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c |   13 ++++++++++---
 1 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/block.c b/block.c
index af41fb3..76e1c6a 100644
--- a/block.c
+++ b/block.c
@@ -1198,7 +1198,7 @@ static bool tracked_request_overlaps(BdrvTrackedRequest *req,
 }
 
 static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
-        int64_t sector_num, int nb_sectors)
+        int64_t sector_num, int nb_sectors, bool writes_only)
 {
     BdrvTrackedRequest *req;
     int64_t cluster_sector_num;
@@ -1214,9 +1214,16 @@ static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
     round_to_clusters(bs, sector_num, nb_sectors,
                       &cluster_sector_num, &cluster_nb_sectors);
 
+    if (writes_only && !(bs->open_flags & BDRV_O_RDWR)) {
+        return;
+    }
+
     do {
         retry = false;
         QLIST_FOREACH(req, &bs->tracked_requests, list) {
+            if (writes_only && !req->is_write) {
+                continue;
+            }
             if (tracked_request_overlaps(req, cluster_sector_num,
                                          cluster_nb_sectors)) {
                 /* Hitting this means there was a reentrant request, for
@@ -1591,7 +1598,7 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
     }
 
     if (bs->copy_on_read_in_flight) {
-        wait_for_overlapping_requests(bs, sector_num, nb_sectors);
+        wait_for_overlapping_requests(bs, sector_num, nb_sectors, false);
     }
 
     tracked_request_begin(&req, bs, sector_num, nb_sectors, false);
@@ -1665,7 +1672,7 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
     }
 
     if (bs->copy_on_read_in_flight) {
-        wait_for_overlapping_requests(bs, sector_num, nb_sectors);
+        wait_for_overlapping_requests(bs, sector_num, nb_sectors, false);
     }
 
     tracked_request_begin(&req, bs, sector_num, nb_sectors, true);
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 13/18] block: allow waiting at arbitrary granularity
  2012-01-26 17:22 [Qemu-devel] [PATCH v2 00/18] Support mismatched host and guest logical block sizes Paolo Bonzini
                   ` (11 preceding siblings ...)
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 12/18] block: allow waiting only for overlapping writes Paolo Bonzini
@ 2012-01-26 17:22 ` Paolo Bonzini
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 14/18] block: protect against "torn reads" for guest_block_size > host_block_size Paolo Bonzini
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2012-01-26 17:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

When emulating small logical block sizes, the only overlaps
that matter are at host block size granularity, not cluster.
Make wait_for_overlapping_requests more flexible in this
respect, too.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c |   43 ++++++++++++++++++++++++++++---------------
 1 files changed, 28 insertions(+), 15 deletions(-)

diff --git a/block.c b/block.c
index 76e1c6a..c78ca47 100644
--- a/block.c
+++ b/block.c
@@ -1166,24 +1166,33 @@ static void tracked_request_begin(BdrvTrackedRequest *req,
 /**
  * Round a region to cluster boundaries
  */
-static void round_to_clusters(BlockDriverState *bs,
-                              int64_t sector_num, int nb_sectors,
-                              int64_t *cluster_sector_num,
-                              int *cluster_nb_sectors)
+static void round_sectors(int64_t alignment,
+                          int64_t sector_num, int nb_sectors,
+                          int64_t *cluster_sector_num,
+                          int *cluster_nb_sectors)
 {
-    BlockDriverInfo bdi;
-
-    if (bdrv_get_info(bs, &bdi) < 0 || bdi.cluster_size == 0) {
+    if (alignment == BDRV_SECTOR_SIZE) {
         *cluster_sector_num = sector_num;
         *cluster_nb_sectors = nb_sectors;
     } else {
-        int64_t c = bdi.cluster_size / BDRV_SECTOR_SIZE;
+        int64_t c = alignment / BDRV_SECTOR_SIZE;
         *cluster_sector_num = QEMU_ALIGN_DOWN(sector_num, c);
         *cluster_nb_sectors = QEMU_ALIGN_UP(sector_num - *cluster_sector_num +
                                             nb_sectors, c);
     }
 }
 
+static int64_t get_cluster_size(BlockDriverState *bs)
+{
+    BlockDriverInfo bdi;
+
+    if (bdrv_get_info(bs, &bdi) < 0 || bdi.cluster_size == 0) {
+        return BDRV_SECTOR_SIZE;
+    } else {
+        return bdi.cluster_size / BDRV_SECTOR_SIZE;
+    }
+}
+
 static bool tracked_request_overlaps(BdrvTrackedRequest *req,
                                      int64_t sector_num, int nb_sectors) {
     /*        aaaa   bbbb */
@@ -1198,7 +1207,8 @@ static bool tracked_request_overlaps(BdrvTrackedRequest *req,
 }
 
 static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
-        int64_t sector_num, int nb_sectors, bool writes_only)
+        int64_t sector_num, int nb_sectors,
+        int64_t granularity, bool writes_only)
 {
     BdrvTrackedRequest *req;
     int64_t cluster_sector_num;
@@ -1211,8 +1221,8 @@ static void coroutine_fn wait_for_overlapping_requests(BlockDriverState *bs,
      * CoR read and write operations are atomic and guest writes cannot
      * interleave between them.
      */
-    round_to_clusters(bs, sector_num, nb_sectors,
-                      &cluster_sector_num, &cluster_nb_sectors);
+    round_sectors(granularity, sector_num, nb_sectors,
+                  &cluster_sector_num, &cluster_nb_sectors);
 
     if (writes_only && !(bs->open_flags & BDRV_O_RDWR)) {
         return;
@@ -1532,8 +1542,9 @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BlockDriverState *bs,
     /* Cover entire cluster so no additional backing file I/O is required when
      * allocating cluster in the image file.
      */
-    round_to_clusters(bs, sector_num, nb_sectors,
-                      &cluster_sector_num, &cluster_nb_sectors);
+    round_sectors(get_cluster_size(bs),
+                  sector_num, nb_sectors,
+                  &cluster_sector_num, &cluster_nb_sectors);
 
     trace_bdrv_co_do_copy_on_readv(bs, sector_num, nb_sectors,
                                    cluster_sector_num, cluster_nb_sectors);
@@ -1598,7 +1609,8 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
     }
 
     if (bs->copy_on_read_in_flight) {
-        wait_for_overlapping_requests(bs, sector_num, nb_sectors, false);
+        wait_for_overlapping_requests(bs, sector_num, nb_sectors,
+                                      get_cluster_size(bs), false);
     }
 
     tracked_request_begin(&req, bs, sector_num, nb_sectors, false);
@@ -1672,7 +1684,8 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
     }
 
     if (bs->copy_on_read_in_flight) {
-        wait_for_overlapping_requests(bs, sector_num, nb_sectors, false);
+        wait_for_overlapping_requests(bs, sector_num, nb_sectors,
+                                      get_cluster_size(bs), false);
     }
 
     tracked_request_begin(&req, bs, sector_num, nb_sectors, true);
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 14/18] block: protect against "torn reads" for guest_block_size > host_block_size
  2012-01-26 17:22 [Qemu-devel] [PATCH v2 00/18] Support mismatched host and guest logical block sizes Paolo Bonzini
                   ` (12 preceding siblings ...)
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 13/18] block: allow waiting at arbitrary granularity Paolo Bonzini
@ 2012-01-26 17:22 ` Paolo Bonzini
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 15/18] block: align and serialize I/O when guest_block_size < host_block_size Paolo Bonzini
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2012-01-26 17:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

When the guest sees a higher alignment than the host, writes may be
done in multiple steps.  So, reads have to be serialized against
overlapping writes, so that the writes look atomic to the guest.
This is true even when O_DIRECT is not in use.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c |   22 ++++++++++++++++++++++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index c78ca47..683d4a3 100644
--- a/block.c
+++ b/block.c
@@ -1613,6 +1613,16 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
                                       get_cluster_size(bs), false);
     }
 
+    /* When the guest sees a higher alignment than the host, writes may be
+     * done in multiple steps.  So, reads have to be serialized against
+     * overlapping writes, so that the writes look atomic to the guest,
+     * even when O_DIRECT is not in use.
+     */
+    if (bs->guest_block_size > bs->host_block_size) {
+        wait_for_overlapping_requests(bs, sector_num, nb_sectors,
+                                      bs->guest_block_size, true);
+    }
+
     tracked_request_begin(&req, bs, sector_num, nb_sectors, false);
 
     if (flags & BDRV_REQ_COPY_ON_READ) {
@@ -3629,6 +3639,18 @@ BlockDriverAIOCB *bdrv_aio_ioctl(BlockDriverState *bs,
 void bdrv_set_guest_block_size(BlockDriverState *bs, int align)
 {
     bs->guest_block_size = align;
+    if ((bs->open_flags & BDRV_O_RDWR) &&
+        bs->host_block_size < bs->guest_block_size) {
+        error_report("Host block size is %d, guest block size is %d.  Due to partially\n"
+                     "written sectors, power failures may cause data corruption.%s",
+                     bs->host_block_size, bs->guest_block_size,
+
+                     /* The host block size might not be detected correctly if
+                      * the guest is not using O_DIRECT.  */
+                     (bs->open_flags & BDRV_O_NOCACHE) ? "" : "\n"
+                     "If you think this message is wrong, start the guest with cache=none\n"
+                     "and see if it disappears.");
+    }
 }
 
 void *qemu_blockalign(BlockDriverState *bs, size_t size)
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 15/18] block: align and serialize I/O when guest_block_size < host_block_size
  2012-01-26 17:22 [Qemu-devel] [PATCH v2 00/18] Support mismatched host and guest logical block sizes Paolo Bonzini
                   ` (13 preceding siblings ...)
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 14/18] block: protect against "torn reads" for guest_block_size > host_block_size Paolo Bonzini
@ 2012-01-26 17:22 ` Paolo Bonzini
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 16/18] block: default physical block size to host block size Paolo Bonzini
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2012-01-26 17:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

When the guest sees a lower alignment than the host, and O_DIRECT
is in place, I/O might not span an integer number of host sectors.
If this is the case, add a few sectors at the beginning and the end.
When reading, copy interesting data from there to the guest buffer.

When writing, merge data from there with data from the guest buffer.
Since writes potentially become read-modify-write sequences, they have
to be serialized against other operations.  Reads are still atomic
(they only need some postprocessing), so they need not be
serialized.

The math is a bit tricky, but luckily all the primitives are already in
cutils.c and iov.c.

On top of this, the guest's buffers might be misaligned with respect
to the host block size.  However, that's caught by raw-posix.c and
posix-aio-compat.c, so we need not care about it.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 Makefile.objs |    4 +-
 block.c       |  174 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 trace-events  |    1 +
 3 files changed, 175 insertions(+), 4 deletions(-)

diff --git a/Makefile.objs b/Makefile.objs
index 06a147b..f37adb1 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -24,7 +24,7 @@ coroutine-obj-$(CONFIG_WIN32) += coroutine-win32.o
 #######################################################################
 # block-obj-y is code used by both qemu system emulation and qemu-img
 
-block-obj-y = cutils.o cache-utils.o qemu-option.o module.o async.o
+block-obj-y = cutils.o cache-utils.o qemu-option.o module.o async.o iov.o
 block-obj-y += nbd.o block.o aio.o aes.o qemu-config.o qemu-progress.o qemu-sockets.o
 block-obj-y += $(coroutine-obj-y) $(qobject-obj-y) $(version-obj-y)
 block-obj-$(CONFIG_POSIX) += posix-aio-compat.o
@@ -158,7 +158,7 @@ endif
 common-obj-y += $(addprefix ui/, $(ui-obj-y))
 common-obj-$(CONFIG_VNC) += $(addprefix ui/, $(vnc-obj-y))
 
-common-obj-y += iov.o acl.o
+common-obj-y += acl.o
 common-obj-$(CONFIG_POSIX) += compatfd.o
 common-obj-y += notify.o event_notifier.o
 common-obj-y += qemu-timer.o qemu-timer-common.o
diff --git a/block.c b/block.c
index 683d4a3..33ccc23 100644
--- a/block.c
+++ b/block.c
@@ -28,6 +28,7 @@
 #include "block_int.h"
 #include "module.h"
 #include "qjson.h"
+#include "iov.h"
 #include "qemu-coroutine.h"
 #include "qmp-commands.h"
 #include "qemu-timer.h"
@@ -1182,6 +1183,21 @@ static void round_sectors(int64_t alignment,
     }
 }
 
+static bool req_is_aligned(int64_t alignment,
+                           int64_t sector_num, int nb_sectors)
+{
+    if (alignment != BDRV_SECTOR_SIZE) {
+        int64_t c = alignment / BDRV_SECTOR_SIZE;
+        if ((sector_num & (c - 1)) != 0) {
+            return false;
+        }
+        if ((nb_sectors & (c - 1)) != 0) {
+            return false;
+        }
+    }
+    return true;
+}
+
 static int64_t get_cluster_size(BlockDriverState *bs)
 {
     BlockDriverInfo bdi;
@@ -1578,6 +1594,135 @@ err:
     return ret;
 }
 
+static int coroutine_fn bdrv_rw_co_align(BlockDriverState *bs,
+    int64_t sector_num, int nb_sectors, QEMUIOVector *qiov, bool is_write)
+{
+    BlockDriver *drv = bs->drv;
+    QEMUIOVector aligned_qiov;
+
+    int alignment = bs->host_block_size / BDRV_SECTOR_SIZE;
+    int64_t io_sector_num;
+    int io_nb_sectors;
+    int head_extra, tail_sectors;
+    int head_sectors = 0;
+    int tail_offset = 0;
+    int ret;
+    uint8_t *head = NULL;
+    uint8_t *tail = NULL;
+
+    trace_bdrv_rw_co_align(bs, sector_num, nb_sectors, is_write);
+    round_sectors(bs->host_block_size, sector_num, nb_sectors,
+                  &io_sector_num, &io_nb_sectors);
+
+    /*
+     * Here is a drawing of some involved quantities.  Double lines
+     * mark points that are aligned to a host block.
+     *
+     *                           io_sector_num + io_nb_sectors ---.
+     *     .-- io_sector_num                                       \
+     *    /                         sector_num + nb_sectors ---.    \
+     *   /______________ ________________  _____  ______________\____\
+     *  ||              |                ||     ||              |     ||
+     *  ||  head_extra  |  head_sectors  || ... || tail_sectors | ... ||
+     *  ||______________|________________||_____||______________|_____||
+     *                   \                \       \
+     *                    `--- sector_num  \       `--- sector_num + tail_offset
+     *                                      \
+     *                                       `--- io_sector_num + alignment
+     */
+    head_extra = sector_num & (alignment - 1);
+    tail_sectors = (sector_num + nb_sectors) & (alignment - 1);
+
+    qemu_iovec_init(&aligned_qiov, qiov->niov + 2);
+
+    /* If the initial sector is not aligned, we need a "head" block.  If a
+     * single block will satisfy the request, we put it in the "head" too.
+     * This way, the tail always starts on an aligned sector.
+     */
+    if (head_extra != 0 || io_nb_sectors == alignment) {
+        head_sectors = MIN(nb_sectors, alignment - head_extra);
+        head = qemu_blockalign(bs, bs->host_block_size);
+        qemu_iovec_add(&aligned_qiov, head, bs->host_block_size);
+    }
+
+    /* If the aligned request is just 1 block, we will read it in head,
+     * so nothing else to do.
+     */
+    if (io_nb_sectors > alignment) {
+        /* Otherwise, see where the last full block ends...  */
+        tail_offset = nb_sectors - tail_sectors;
+        assert(((sector_num + tail_offset) & (alignment - 1)) == 0);
+
+        /* ... add everything after the head and up to it.  */
+        qemu_iovec_copy(&aligned_qiov, qiov,
+                        head_sectors * BDRV_SECTOR_SIZE,
+                        (tail_offset - head_sectors) * BDRV_SECTOR_SIZE);
+
+        /* If the final sector is not aligned, add an extra block at
+         * the end.
+         */
+        if (tail_sectors != 0) {
+            tail = qemu_blockalign(bs, bs->host_block_size);
+            qemu_iovec_add(&aligned_qiov, tail, bs->host_block_size);
+        }
+    }
+
+    assert(head || tail);
+    if (is_write) {
+        QEMUIOVector rmw_qiov;
+        struct iovec rmw_iov;
+
+        /* Merge the user data with data read from the first block...  */
+        if (head != NULL) {
+            rmw_iov.iov_base = head;
+            rmw_iov.iov_len = bs->host_block_size;
+            qemu_iovec_init_external(&rmw_qiov, &rmw_iov, 1);
+            ret = drv->bdrv_co_readv(bs, io_sector_num, alignment, &rmw_qiov);
+            if (ret < 0) {
+                return ret;
+            }
+            iov_to_buf(qiov->iov, qiov->size,
+                       (head + head_extra * BDRV_SECTOR_SIZE), 0,
+                       head_sectors * BDRV_SECTOR_SIZE);
+        }
+
+        /* ... and from the last block.  The two reads could be merged if they
+         * access consecutive blocks, but it is rare and we do not care.  */
+        if (tail != NULL) {
+            rmw_iov.iov_base = tail;
+            rmw_iov.iov_len = bs->host_block_size;
+            qemu_iovec_init_external(&rmw_qiov, &rmw_iov, 1);
+            ret = drv->bdrv_co_readv(bs, sector_num + tail_offset, alignment, &rmw_qiov);
+            if (ret < 0) {
+                return ret;
+            }
+            iov_to_buf(qiov->iov, qiov->size,
+                       tail, tail_offset * BDRV_SECTOR_SIZE,
+                       tail_sectors * BDRV_SECTOR_SIZE);
+        }
+        ret = drv->bdrv_co_writev(bs, io_sector_num, io_nb_sectors, &aligned_qiov);
+    } else {
+        ret = drv->bdrv_co_readv(bs, io_sector_num, io_nb_sectors, &aligned_qiov);
+        /* Copy the misaligned parts that the user wants.  */
+        if (head != NULL) {
+            iov_from_buf(qiov->iov, qiov->size,
+                         (head + head_extra * BDRV_SECTOR_SIZE), 0,
+                         head_sectors * BDRV_SECTOR_SIZE);
+        }
+        if (tail != NULL) {
+            iov_from_buf(qiov->iov, qiov->size,
+                         tail, tail_offset * BDRV_SECTOR_SIZE,
+                         tail_sectors * BDRV_SECTOR_SIZE);
+        }
+    }
+
+    qemu_iovec_destroy(&aligned_qiov);
+    g_free(head);
+    g_free(tail);
+    return ret;
+}
+
+
 /*
  * Handle a read request in coroutine context
  */
@@ -1639,7 +1784,12 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
         }
     }
 
-    ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
+    if ((bs->open_flags & BDRV_O_NOCACHE) &&
+        !req_is_aligned(bs->host_block_size, sector_num, nb_sectors)) {
+        ret = bdrv_rw_co_align(bs, sector_num, nb_sectors, qiov, false);
+    } else {
+        ret = drv->bdrv_co_readv(bs, sector_num, nb_sectors, qiov);
+    }
 
 out:
     tracked_request_end(&req);
@@ -1698,9 +1847,25 @@ static int coroutine_fn bdrv_co_do_writev(BlockDriverState *bs,
                                       get_cluster_size(bs), false);
     }
 
+    /* When the guest sees a lower alignment than the host, and O_DIRECT
+     * is in place, assume that the write will become a read-modify-write
+     * sequence.  These have to be serialized against each other, so that
+     * they look atomic to the guest.
+     */
+    if ((bs->open_flags & BDRV_O_NOCACHE) &&
+        bs->guest_block_size < bs->host_block_size) {
+        wait_for_overlapping_requests(bs, sector_num, nb_sectors,
+                                      bs->host_block_size, true);
+    }
+
     tracked_request_begin(&req, bs, sector_num, nb_sectors, true);
 
-    ret = drv->bdrv_co_writev(bs, sector_num, nb_sectors, qiov);
+    if ((bs->open_flags & BDRV_O_NOCACHE) &&
+        !req_is_aligned(bs->host_block_size, sector_num, nb_sectors)) {
+        ret = bdrv_rw_co_align(bs, sector_num, nb_sectors, qiov, true);
+    } else {
+        ret = drv->bdrv_co_writev(bs, sector_num, nb_sectors, qiov);
+    }
 
     if (bs->dirty_bitmap) {
         set_dirty_bitmap(bs, sector_num, nb_sectors, 1);
@@ -3639,6 +3803,12 @@ BlockDriverAIOCB *bdrv_aio_ioctl(BlockDriverState *bs,
 void bdrv_set_guest_block_size(BlockDriverState *bs, int align)
 {
     bs->guest_block_size = align;
+    if ((bs->open_flags & BDRV_O_NOCACHE) &&
+        bs->guest_block_size < bs->host_block_size) {
+        error_report("Host block size is %d, guest block size is %d.  Direct\n"
+                     "access to the host device may cause performance degradation.",
+                     bs->host_block_size, bs->guest_block_size);
+    }
     if ((bs->open_flags & BDRV_O_RDWR) &&
         bs->host_block_size < bs->guest_block_size) {
         error_report("Host block size is %d, guest block size is %d.  Due to partially\n"
diff --git a/trace-events b/trace-events
index 75f6e17..9328056 100644
--- a/trace-events
+++ b/trace-events
@@ -69,6 +69,7 @@ bdrv_co_copy_on_readv(void *bs, int64_t sector_num, int nb_sector) "bs %p sector
 bdrv_co_writev(void *bs, int64_t sector_num, int nb_sector) "bs %p sector_num %"PRId64" nb_sectors %d"
 bdrv_co_io_em(void *bs, int64_t sector_num, int nb_sectors, int is_write, void *acb) "bs %p sector_num %"PRId64" nb_sectors %d is_write %d acb %p"
 bdrv_co_do_copy_on_readv(void *bs, int64_t sector_num, int nb_sectors, int64_t cluster_sector_num, int cluster_nb_sectors) "bs %p sector_num %"PRId64" nb_sectors %d cluster_sector_num %"PRId64" cluster_nb_sectors %d"
+bdrv_rw_co_align(void *bs, int64_t sector_num, int nb_sectors, bool is_write) "bs %p sector_num %"PRId64" nb_sectors %d is_write %d"
 
 # block/stream.c
 stream_one_iteration(void *s, int64_t sector_num, int nb_sectors, int is_allocated) "s %p sector_num %"PRId64" nb_sectors %d is_allocated %d"
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 16/18] block: default physical block size to host block size
  2012-01-26 17:22 [Qemu-devel] [PATCH v2 00/18] Support mismatched host and guest logical block sizes Paolo Bonzini
                   ` (14 preceding siblings ...)
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 15/18] block: align and serialize I/O when guest_block_size < host_block_size Paolo Bonzini
@ 2012-01-26 17:22 ` Paolo Bonzini
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 17/18] block: default min_io_size to host block size when doing rmw Paolo Bonzini
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 18/18] qemu-io: add blocksize argument to open Paolo Bonzini
  17 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2012-01-26 17:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

Hopefully, this will help operating systems in making less misaligned
I/O operations.  Not really true for Linux, but why not try.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c |   17 +++++++++++++++++
 block.h |   15 ++-------------
 2 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/block.c b/block.c
index 33ccc23..0fce655 100644
--- a/block.c
+++ b/block.c
@@ -4136,3 +4136,20 @@ bool block_job_is_cancelled(BlockJob *job)
 {
     return job->cancelled;
 }
+
+unsigned int get_physical_block_exp(BlockConf *conf)
+{
+    unsigned int exp = 0, size;
+
+    if (conf->physical_block_size || !conf->bs) {
+        size = conf->physical_block_size;
+    } else {
+        size = conf->bs->host_block_size;
+    }
+
+    for (; size > conf->logical_block_size; size >>= 1) {
+        exp++;
+    }
+
+    return exp;
+}
diff --git a/block.h b/block.h
index 34dc925..4270f67 100644
--- a/block.h
+++ b/block.h
@@ -405,25 +405,14 @@ typedef struct BlockConf {
     uint32_t discard_granularity;
 } BlockConf;
 
-static inline unsigned int get_physical_block_exp(BlockConf *conf)
-{
-    unsigned int exp = 0, size;
-
-    for (size = conf->physical_block_size;
-        size > conf->logical_block_size;
-        size >>= 1) {
-        exp++;
-    }
-
-    return exp;
-}
+unsigned int get_physical_block_exp(BlockConf *conf);
 
 #define DEFINE_BLOCK_PROPERTIES(_state, _conf)                          \
     DEFINE_PROP_DRIVE("drive", _state, _conf.bs),                       \
     DEFINE_PROP_UINT16("logical_block_size", _state,                    \
                        _conf.logical_block_size, 512),                  \
     DEFINE_PROP_UINT16("physical_block_size", _state,                   \
-                       _conf.physical_block_size, 512),                 \
+                       _conf.physical_block_size, 0),                 \
     DEFINE_PROP_UINT16("min_io_size", _state, _conf.min_io_size, 0),  \
     DEFINE_PROP_UINT32("opt_io_size", _state, _conf.opt_io_size, 0),    \
     DEFINE_PROP_INT32("bootindex", _state, _conf.bootindex, -1),        \
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 17/18] block: default min_io_size to host block size when doing rmw
  2012-01-26 17:22 [Qemu-devel] [PATCH v2 00/18] Support mismatched host and guest logical block sizes Paolo Bonzini
                   ` (15 preceding siblings ...)
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 16/18] block: default physical block size to host block size Paolo Bonzini
@ 2012-01-26 17:22 ` Paolo Bonzini
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 18/18] qemu-io: add blocksize argument to open Paolo Bonzini
  17 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2012-01-26 17:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

Same as the previous patch.  However, since the min_io_size is a byte
value rather than an exponent, we want to make sure that we pass a 0
when running on 512b-sector disks.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c         |   17 +++++++++++++++++
 block.h         |    1 +
 hw/scsi-disk.c  |    2 +-
 hw/virtio-blk.c |    2 +-
 4 files changed, 20 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 0fce655..4880143 100644
--- a/block.c
+++ b/block.c
@@ -4153,3 +4153,20 @@ unsigned int get_physical_block_exp(BlockConf *conf)
 
     return exp;
 }
+
+unsigned int get_min_io_size(BlockConf *conf)
+{
+    if (conf->min_io_size || !conf->bs) {
+        return conf->min_io_size;
+    }
+
+    /* Ask the OS to avoid read-modify-write cycles.  But when running
+     * on 512b-sector disks, we should not modify the parameters that
+     * guests had seen so far.
+     */
+    if (conf->bs->host_block_size > conf->logical_block_size) {
+        return conf->bs->host_block_size;
+    }
+
+    return 0;
+}
diff --git a/block.h b/block.h
index 4270f67..ad1d18c 100644
--- a/block.h
+++ b/block.h
@@ -406,6 +406,7 @@ typedef struct BlockConf {
 } BlockConf;
 
 unsigned int get_physical_block_exp(BlockConf *conf);
+unsigned int get_min_io_size(BlockConf *conf);
 
 #define DEFINE_BLOCK_PROPERTIES(_state, _conf)                          \
     DEFINE_PROP_DRIVE("drive", _state, _conf.bs),                       \
diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index 4f370a6..032ccf1 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -429,7 +429,7 @@ static int scsi_disk_emulate_inquiry(SCSIRequest *req, uint8_t *outbuf)
             unsigned int unmap_sectors =
                     s->qdev.conf.discard_granularity / s->qdev.blocksize;
             unsigned int min_io_size =
-                    s->qdev.conf.min_io_size / s->qdev.blocksize;
+                    get_min_io_size(&s->qdev.conf) / s->qdev.blocksize;
             unsigned int opt_io_size =
                     s->qdev.conf.opt_io_size / s->qdev.blocksize;
 
diff --git a/hw/virtio-blk.c b/hw/virtio-blk.c
index 7c44a1f..06b5c0d 100644
--- a/hw/virtio-blk.c
+++ b/hw/virtio-blk.c
@@ -491,7 +491,7 @@ static void virtio_blk_update_config(VirtIODevice *vdev, uint8_t *config)
     stl_raw(&blkcfg.seg_max, 128 - 2);
     stw_raw(&blkcfg.cylinders, cylinders);
     stl_raw(&blkcfg.blk_size, blk_size);
-    stw_raw(&blkcfg.min_io_size, s->conf->min_io_size / blk_size);
+    stw_raw(&blkcfg.min_io_size, get_min_io_size(s->conf) / blk_size);
     stw_raw(&blkcfg.opt_io_size, s->conf->opt_io_size / blk_size);
     blkcfg.heads = heads;
     blkcfg.sectors = secs & ~s->sector_mask;
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH v2 18/18] qemu-io: add blocksize argument to open
  2012-01-26 17:22 [Qemu-devel] [PATCH v2 00/18] Support mismatched host and guest logical block sizes Paolo Bonzini
                   ` (16 preceding siblings ...)
  2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 17/18] block: default min_io_size to host block size when doing rmw Paolo Bonzini
@ 2012-01-26 17:22 ` Paolo Bonzini
  17 siblings, 0 replies; 19+ messages in thread
From: Paolo Bonzini @ 2012-01-26 17:22 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf

Make it possible to test the alignment code using qemu-io.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 qemu-io.c |   33 ++++++++++++++++++++++++++++-----
 1 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/qemu-io.c b/qemu-io.c
index 7c446b6..268db89 100644
--- a/qemu-io.c
+++ b/qemu-io.c
@@ -1594,7 +1594,7 @@ static const cmdinfo_t close_cmd = {
     .oneline    = "close the current open file",
 };
 
-static int openfile(char *name, int flags, int growable)
+static int openfile(char *name, int flags, int growable, int blocksize)
 {
     if (bs) {
         fprintf(stderr, "file open already, try 'help close'\n");
@@ -1615,6 +1615,17 @@ static int openfile(char *name, int flags, int growable)
             bs = NULL;
             return 1;
         }
+        if (blocksize) {
+            if (blocksize < bs->host_block_size && (flags & BDRV_O_NOCACHE)) {
+                fprintf(stderr, "%s: block size cannot be smaller than the actual size\n", progname);
+            } else {
+                if (blocksize > 512) {
+                    /* Always go through the RMW logic.  */
+                    bs->open_flags |= BDRV_O_NOCACHE;
+                }
+                bs->host_block_size = blocksize;
+            }
+        }
     }
 
     return 0;
@@ -1633,7 +1644,8 @@ static void open_help(void)
 " -r, -- open file read-only\n"
 " -s, -- use snapshot file\n"
 " -n, -- disable host cache\n"
-" -g, -- allow file to grow (only applies to protocols)"
+" -g, -- allow file to grow (only applies to protocols)\n"
+" -bSIZE, -- behave as if the host block size was SIZE\n"
 "\n");
 }
 
@@ -1656,9 +1668,11 @@ static int open_f(int argc, char **argv)
     int flags = 0;
     int readonly = 0;
     int growable = 0;
+    long blocksize = 0;
+    char *endptr = NULL;
     int c;
 
-    while ((c = getopt(argc, argv, "snrg")) != EOF) {
+    while ((c = getopt(argc, argv, "snrgb:")) != EOF) {
         switch (c) {
         case 's':
             flags |= BDRV_O_SNAPSHOT;
@@ -1672,6 +1686,15 @@ static int open_f(int argc, char **argv)
         case 'g':
             growable = 1;
             break;
+        case 'b':
+            blocksize = strtol(optarg, &endptr, 0);
+            if (blocksize < 512 || blocksize > 65536 ||
+                (blocksize & (blocksize - 1)) || *endptr != '\0') {
+                printf("The block size must be a power of two "
+                       "between 512 and 65536.\n");
+                return -1;
+            }
+            break;
         default:
             return command_usage(&open_cmd);
         }
@@ -1685,7 +1708,7 @@ static int open_f(int argc, char **argv)
         return command_usage(&open_cmd);
     }
 
-    return openfile(argv[optind], flags, growable);
+    return openfile(argv[optind], flags, growable, blocksize);
 }
 
 static int init_args_command(int index)
@@ -1825,7 +1848,7 @@ int main(int argc, char **argv)
     }
 
     if ((argc - optind) == 1) {
-        openfile(argv[optind], flags, growable);
+        openfile(argv[optind], flags, growable, 0);
     }
     command_loop();
 
-- 
1.7.7.6

^ permalink raw reply related	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-01-26 17:24 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-26 17:22 [Qemu-devel] [PATCH v2 00/18] Support mismatched host and guest logical block sizes Paolo Bonzini
2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 01/18] block: do not rely on open_flags for bdrv_is_snapshot Paolo Bonzini
2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 02/18] block: store actual flags in bs->open_flags Paolo Bonzini
2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 03/18] block: pass protocol flags up to the format Paolo Bonzini
2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 04/18] block: non-raw protocols never cache Paolo Bonzini
2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 05/18] block: remove enable_write_cache Paolo Bonzini
2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 06/18] block: move flag bits together Paolo Bonzini
2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 07/18] raw: remove the aligned_buf Paolo Bonzini
2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 08/18] block: rename buffer_alignment to guest_block_size Paolo Bonzini
2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 09/18] block: add host_block_size Paolo Bonzini
2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 10/18] raw: probe host_block_size Paolo Bonzini
2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 11/18] iscsi: save host block size Paolo Bonzini
2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 12/18] block: allow waiting only for overlapping writes Paolo Bonzini
2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 13/18] block: allow waiting at arbitrary granularity Paolo Bonzini
2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 14/18] block: protect against "torn reads" for guest_block_size > host_block_size Paolo Bonzini
2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 15/18] block: align and serialize I/O when guest_block_size < host_block_size Paolo Bonzini
2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 16/18] block: default physical block size to host block size Paolo Bonzini
2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 17/18] block: default min_io_size to host block size when doing rmw Paolo Bonzini
2012-01-26 17:22 ` [Qemu-devel] [PATCH v2 18/18] qemu-io: add blocksize argument to open Paolo Bonzini

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.