All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] RFC: Add new block driver for the VDI format
@ 2009-07-03 19:24 Stefan Weil
  2009-07-03 19:29 ` [Qemu-devel] [PATCH] Check availability of uuid header / lib Stefan Weil
                   ` (2 more replies)
  0 siblings, 3 replies; 44+ messages in thread
From: Stefan Weil @ 2009-07-03 19:24 UTC (permalink / raw)
  To: QEMU Developers

Hello,

this mail will be followed by two patches which
allow QEMU to create, read and write VDI images.
VDI is the native image format of SUN's VirtualBox emulator.

The code was written from scratch for QEMU, while earlier patches
sent to Qemu-devel were wrappers for code from VirtualBox. See
http://lists.gnu.org/archive/html/qemu-devel/2008-07/msg00366.html
for those patches.

Patch 1 adds uuid support to QEMU and can be used independent of
the second patch (look for uuid in vl.c, for example).

Patch 2 adds the VDI block driver. It only needs uuid support
when a new image is created (without uuid, it will create an image
with zeroed "uuid" values), so this second patch is semi-independent
of the first.

Please test, comment and add both patches to QEMU master
(if there are no objections).

Regards

Stefan Weil

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH] Check availability of uuid header / lib
  2009-07-03 19:24 [Qemu-devel] [PATCH] RFC: Add new block driver for the VDI format Stefan Weil
@ 2009-07-03 19:29 ` Stefan Weil
  2009-07-03 19:29   ` [Qemu-devel] [PATCH] Add new block driver for the VDI format Stefan Weil
  2009-07-06 13:37 ` [Qemu-devel] [PATCH] RFC: " Anthony Liguori
  2009-07-23 15:58 ` [Qemu-devel] [PATCH] RFC: Add new block driver for the VDI format (aio version) Stefan Weil
  2 siblings, 1 reply; 44+ messages in thread
From: Stefan Weil @ 2009-07-03 19:29 UTC (permalink / raw)
  To: QEMU Developers

The Universally Unique Identifier library will be used
for the new vdi block driver and maybe other parts of QEMU.

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
---
 Makefile        |    1 +
 Makefile.target |    2 ++
 configure       |   21 +++++++++++++++++++++
 3 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/Makefile b/Makefile
index 66c28e5..b6bb41a 100644
--- a/Makefile
+++ b/Makefile
@@ -33,6 +33,7 @@ else
 DOCS=
 endif
 
+LIBS+=$(CONFIG_UUID_LIBS)
 LIBS+=$(PTHREADLIBS)
 LIBS+=$(CLOCKLIBS)
 
diff --git a/Makefile.target b/Makefile.target
index a593503..21f9b3e 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -541,6 +541,8 @@ ifdef CONFIG_BLUEZ
 LIBS += $(CONFIG_BLUEZ_LIBS)
 endif
 
+LIBS += $(CONFIG_UUID_LIBS)
+
 # xen backend driver support
 obj-$(CONFIG_XEN) += xen_machine_pv.o xen_domainbuild.o
 ifeq ($(CONFIG_XEN), yes)
diff --git a/configure b/configure
index 73cc6b1..aab2c33 100755
--- a/configure
+++ b/configure
@@ -972,6 +972,22 @@ if $cc $ARCH_CFLAGS -o $TMPE $TMPC > /dev/null 2> /dev/null ; then
 fi
 
 ##########################################
+# uuid_generate() probe, used for vdi block driver
+uuid="no"
+cat > $TMPC << EOF
+#include <uuid/uuid.h>
+int main(void)
+{
+    uuid_t my_uuid;
+    uuid_generate(my_uuid);
+    return 0;
+}
+EOF
+if $cc $ARCH_CFLAGS -o $TMPE $TMPC -luuid >/dev/null 2>&1; then
+   uuid="yes"
+fi
+
+##########################################
 # vde libraries probe
 if test "$vde" = "yes" ; then
   cat > $TMPC << EOF
@@ -1453,6 +1469,7 @@ echo "Install blobs     $blobs"
 echo -e "KVM support       $kvm"
 echo "fdt support       $fdt"
 echo "preadv support    $preadv"
+echo "uuid support      $uuid"
 
 if test $sdl_too_old = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -1697,6 +1714,10 @@ fi
 if test "$fnmatch" = "yes" ; then
   echo "#define HAVE_FNMATCH_H 1" >> $config_h
 fi
+if test "$uuid" = "yes" ; then
+  echo "#define HAVE_UUID_H 1" >> $config_h
+  echo "CONFIG_UUID_LIBS=-luuid" >> $config_mak
+fi
 qemu_version=`head $source_path/VERSION`
 echo "VERSION=$qemu_version" >>$config_mak
 echo "#define QEMU_VERSION \"$qemu_version\"" >> $config_h
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH] Add new block driver for the VDI format
  2009-07-03 19:29 ` [Qemu-devel] [PATCH] Check availability of uuid header / lib Stefan Weil
@ 2009-07-03 19:29   ` Stefan Weil
  2009-07-05  8:05     ` Christoph Hellwig
  2009-07-05 14:44     ` Kevin Wolf
  0 siblings, 2 replies; 44+ messages in thread
From: Stefan Weil @ 2009-07-03 19:29 UTC (permalink / raw)
  To: QEMU Developers

This is a new block driver written from scratch
to support the VDI format in QEMU.

VDI is the native format used by Innotek / SUN VirtualBox.

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
---
 Makefile    |    4 +-
 block/vdi.c |  598 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 600 insertions(+), 2 deletions(-)
 create mode 100644 block/vdi.c

diff --git a/Makefile b/Makefile
index b6bb41a..56815c7 100644
--- a/Makefile
+++ b/Makefile
@@ -67,8 +67,8 @@ recurse-all: $(SUBDIR_RULES)
 #######################################################################
 # block-obj-y is code used by both qemu system emulation and qemu-img
 
-block-obj-y = cutils.o cache-utils.o qemu-malloc.o qemu-option.o module.o
-block-obj-y += block/cow.o block/qcow.o aes.o block/vmdk.o block/cloop.o
+block-obj-y = cutils.o cache-utils.o qemu-malloc.o qemu-option.o aes.o module.o
+block-obj-y += block/cow.o block/qcow.o block/vdi.o block/vmdk.o block/cloop.o
 block-obj-y += block/dmg.o block/bochs.o block/vpc.o block/vvfat.o
 block-obj-y += block/qcow2.o block/qcow2-refcount.o block/qcow2-cluster.o
 block-obj-y += block/qcow2-snapshot.o
diff --git a/block/vdi.c b/block/vdi.c
new file mode 100644
index 0000000..78e223c
--- /dev/null
+++ b/block/vdi.c
@@ -0,0 +1,598 @@
+/*
+ * Block driver for the Virtual Disk Image (VDI) format
+ *
+ * Copyright (c) 2009 Stefan Weil
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) version 3 or any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Reference:
+ * http://forums.virtualbox.org/viewtopic.php?t=8046
+ *
+ * This driver supports create / read / write operations on VDI images.
+ *
+ * Some features like snapshots are still missing (see TODO in code).
+ * Deallocation of zero-filled clusters is missing, too
+ * (might be added to common block layer).
+ * Asynchronous read / write support could be added, too.
+ */
+
+#include "qemu-common.h"
+#include "block_int.h"
+#include "module.h"
+
+#if defined(HAVE_UUID_H)
+#include <uuid/uuid.h>
+#endif
+
+/* Enable debug messages. */
+//~ #define CONFIG_VDI_DEBUG
+
+/* Support experimental write operations on VDI images. */
+#define CONFIG_VDI_WRITE
+
+/* Support snapshot images. */
+//~ #define CONFIG_VDI_SNAPSHOT
+
+/* Enable (currently) unsupported features. */
+//~ #define CONFIG_VDI_UNSUPPORTED
+
+/* Support non-standard cluster (block) size. */
+//~ #define CONFIG_VDI_CLUSTER_SIZE
+
+#define KiB     1024
+#define MiB     (KiB * KiB)
+
+#if defined(CONFIG_VDI_DEBUG)
+#define logout(fmt, ...) \
+                fprintf(stderr, "vdi\t%-24s" fmt, __func__, ##__VA_ARGS__)
+#else
+#define logout(fmt, ...) ((void)0)
+#endif
+
+#define SECTOR_SIZE 512
+
+/* Image signature. */
+#define VDI_SIGNATURE 0xbeda107f
+
+/* Image version. */
+#define VDI_VERSION_1_1 0x00010001
+
+/* Image type. */
+#define VDI_TYPE_DYNAMIC 1
+#define VDI_TYPE_FIXED  2
+
+/* Innotek / SUN images use these strings in header.text:
+ * "<<< innotek VirtualBox Disk Image >>>\n"
+ * "<<< Sun xVM VirtualBox Disk Image >>>\n"
+ * "<<< Sun VirtualBox Disk Image >>>\n"
+ * The value does not matter, so QEMU created images use a different text.
+ */
+#define VDI_TEXT "<<< QEMU VM Virtual Disk Image >>>\n"
+
+#if !defined(HAVE_UUID_H)
+typedef unsigned char uuid_t[16];
+#endif
+
+typedef struct {
+    char text[0x40];
+    uint32_t signature;
+    uint32_t version;
+    uint32_t header_size;
+    uint32_t image_type;
+    uint32_t image_flags;
+    char description[256];
+    uint32_t offset_blockmap;
+    uint32_t offset_data;
+    uint32_t cylinders;         /* disk geometry, unused here */
+    uint32_t heads;             /* disk geometry, unused here */
+    uint32_t sectors;           /* disk geometry, unused here */
+    uint32_t sector_size;
+    uint32_t unused1;
+    uint64_t disk_size;
+    uint32_t block_size;
+    uint32_t block_extra;       /* unused here */
+    uint32_t blocks_in_image;
+    uint32_t blocks_allocated;
+    uuid_t uuid_image;
+    uuid_t uuid_last_snap;
+    uuid_t uuid_link;
+    uuid_t uuid_parent;
+    uint64_t unused2[7];
+} VdiHeader;
+
+typedef struct BDRVVdiState {
+    BlockDriverState *hd;
+    uint32_t *blockmap;
+    /* Size of cluster (bytes). */
+    uint32_t cluster_size;
+    /* Size of cluster (sectors). */
+    uint32_t cluster_sectors;
+    VdiHeader header;
+} BDRVVdiState;
+
+static void vdi_header_to_cpu(VdiHeader *header)
+{
+    le32_to_cpus(&header->signature);
+    le32_to_cpus(&header->version);
+    le32_to_cpus(&header->header_size);
+    le32_to_cpus(&header->image_type);
+    le32_to_cpus(&header->image_flags);
+    le32_to_cpus(&header->offset_blockmap);
+    le32_to_cpus(&header->offset_data);
+    le32_to_cpus(&header->cylinders);
+    le32_to_cpus(&header->heads);
+    le32_to_cpus(&header->sectors);
+    le32_to_cpus(&header->sector_size);
+    le64_to_cpus(&header->disk_size);
+    le32_to_cpus(&header->block_size);
+    le32_to_cpus(&header->block_extra);
+    le32_to_cpus(&header->blocks_in_image);
+    le32_to_cpus(&header->blocks_allocated);
+}
+
+static void vdi_header_to_le(VdiHeader *header)
+{
+    cpu_to_le32s(&header->signature);
+    cpu_to_le32s(&header->version);
+    cpu_to_le32s(&header->header_size);
+    cpu_to_le32s(&header->image_type);
+    cpu_to_le32s(&header->image_flags);
+    cpu_to_le32s(&header->offset_blockmap);
+    cpu_to_le32s(&header->offset_data);
+    cpu_to_le32s(&header->cylinders);
+    cpu_to_le32s(&header->heads);
+    cpu_to_le32s(&header->sectors);
+    cpu_to_le32s(&header->sector_size);
+    cpu_to_le64s(&header->disk_size);
+    cpu_to_le32s(&header->block_size);
+    cpu_to_le32s(&header->block_extra);
+    cpu_to_le32s(&header->blocks_in_image);
+    cpu_to_le32s(&header->blocks_allocated);
+}
+
+static void vdi_header_print(VdiHeader *header)
+{
+    logout("text        %s", header->text);
+    logout("signature   0x%04x\n", header->signature);
+    logout("header size 0x%04x\n", header->header_size);
+    logout("image type  0x%04x\n", header->image_type);
+    logout("image flags 0x%04x\n", header->image_flags);
+    logout("description %s\n", header->description);
+    logout("offset bmap 0x%04x\n", header->offset_blockmap);
+    logout("offset data 0x%04x\n", header->offset_data);
+    logout("cylinders   0x%04x\n", header->cylinders);
+    logout("heads       0x%04x\n", header->heads);
+    logout("sectors     0x%04x\n", header->sectors);
+    logout("sector size 0x%04x\n", header->sector_size);
+    logout("image size  0x%" PRIx64 " B (%" PRIu64 " MiB)\n",
+           header->disk_size, header->disk_size / MiB);
+    logout("block size  0x%04x\n", header->block_size);
+    logout("block extra 0x%04x\n", header->block_extra);
+    logout("blocks tot. 0x%04x\n", header->blocks_in_image);
+    logout("blocks all. 0x%04x\n", header->blocks_allocated);
+}
+
+static int vdi_check(BlockDriverState *bs)
+{
+    /* TODO: missing code. */
+    logout("\n");
+    return -ENOTSUP;
+}
+
+static int vdi_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
+{
+    /* TODO: unchecked code. */
+    BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
+    logout("\n");
+    bdi->cluster_size = s->cluster_size;
+    bdi->vm_state_offset = -1;
+    return -ENOTSUP;
+}
+
+static int vdi_make_empty(BlockDriverState *bs)
+{
+    /* TODO: missing code. */
+    logout("\n");
+    return -ENOTSUP;
+}
+
+static int vdi_probe(const uint8_t *buf, int buf_size, const char *filename)
+{
+    const VdiHeader *header = (const VdiHeader *)buf;
+    int result = 0;
+
+    if (buf_size < sizeof(*header)) {
+        /* Header too small, no VDI. */
+    } else if (le32_to_cpu(header->signature) == VDI_SIGNATURE) {
+        result = 100;
+    }
+
+    if (result == 0) {
+        logout("no vdi image\n");
+    } else {
+        logout("%s", header->text);
+    }
+
+    return result;
+}
+
+#if defined(CONFIG_VDI_SNAPSHOT)
+static int vdi_snapshot_create(const char *filename, const char *backing_file)
+{
+    /* TODO: missing code. */
+    logout("\n");
+    return -1;
+}
+#endif
+
+static int vdi_open(BlockDriverState *bs, const char *filename, int flags)
+{
+    BDRVVdiState *s = bs->opaque;
+    VdiHeader header;
+    size_t blockmap_size;
+    int ret;
+
+    logout("\n");
+
+    /* Performance is terrible right now with cache=writethrough due mainly
+     * to reference count updates.  If the user does not explicitly specify
+     * a caching type, force to writeback caching.
+     * TODO: This was copied from qcow2.c, maybe it is true for vdi, too.
+     */
+    if ((flags & BDRV_O_CACHE_DEF)) {
+        flags |= BDRV_O_CACHE_WB;
+        flags &= ~BDRV_O_CACHE_DEF;
+    }
+
+    ret = bdrv_file_open(&s->hd, filename, flags);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (bdrv_pread(s->hd, 0, &header, sizeof(header)) != sizeof(header)) {
+        goto fail;
+    }
+
+    vdi_header_to_cpu(&header);
+    vdi_header_print(&header);
+
+    if (header.version != VDI_VERSION_1_1) {
+        logout("unsupported version %u.%u\n",
+               header.version >> 16, header.version & 0xffff);
+        goto fail;
+    } else if (header.offset_blockmap % SECTOR_SIZE != 0) {
+        /* We only support blockmaps which start on a sector boundary. */
+        logout("unsupported blockmap offset 0x%x B\n", header.offset_blockmap);
+        goto fail;
+    } else if (header.offset_data % SECTOR_SIZE != 0) {
+        /* We only support data blocks which start on a sector boundary. */
+        logout("unsupported data offset 0x%x B\n", header.offset_data);
+        goto fail;
+    } else if (header.sector_size != SECTOR_SIZE) {
+        logout("unsupported sector size %u B\n", header.sector_size);
+        goto fail;
+    } else if (header.block_size != 1 * MiB) {
+        logout("unsupported block size %u B\n", header.block_size);
+        goto fail;
+    } else if (header.disk_size !=
+               (uint64_t)header.blocks_in_image * header.block_size) {
+        logout("unexpected block number %u B\n", header.blocks_in_image);
+        goto fail;
+    }
+
+    bs->total_sectors = header.disk_size / SECTOR_SIZE;
+
+    blockmap_size = header.blocks_in_image * sizeof(uint32_t);
+    s->blockmap = qemu_malloc(blockmap_size);
+    if (bdrv_pread(s->hd, header.offset_blockmap, s->blockmap, blockmap_size) != blockmap_size) {
+        goto fail_free_blockmap;
+    }
+
+    /* Blocks (VDI documentation) correspond to clusters (QEMU). */
+    s->cluster_size = header.block_size;
+    s->cluster_sectors = (header.block_size / SECTOR_SIZE);
+    s->header = header;
+    logout("cluster size %u KiB\n", s->cluster_size / KiB);
+
+    return 0;
+
+ fail_free_blockmap:
+    qemu_free(s->blockmap);
+
+ fail:
+    bdrv_delete(s->hd);
+    return -1;
+}
+
+static int vdi_is_allocated(BlockDriverState *bs, int64_t sector_num,
+                             int nb_sectors, int *pnum)
+{
+    /* TODO: Check for too large sector_num (in bdrv_is_allocated or here). */
+    BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
+    size_t blockmap_index = sector_num / s->cluster_sectors;
+    size_t sector_in_cluster = sector_num % s->cluster_sectors;
+    int n_sectors = s->cluster_sectors - sector_in_cluster;
+    uint32_t cluster_index = s->blockmap[blockmap_index];
+    logout("%p, %" PRId64 ", %d, %p\n", bs, sector_num, nb_sectors, pnum);
+    if (n_sectors > nb_sectors) {
+        n_sectors = nb_sectors;
+    }
+    *pnum = n_sectors;
+    return cluster_index != UINT32_MAX;
+}
+
+static int vdi_read(BlockDriverState *bs, int64_t sector_num,
+                    uint8_t *buf, int nb_sectors)
+{
+    BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
+    logout("%p, %" PRId64 ", %p, %d\n", bs, sector_num, buf, nb_sectors);
+    if (sector_num < 0) {
+        logout("unsupported sector %" PRId64 "\n", sector_num);
+        return -1;
+    }
+    while (nb_sectors > 0 && sector_num < bs->total_sectors) {
+        size_t n_bytes;
+        uint32_t blockmap_entry;
+        size_t block_index = sector_num / s->cluster_sectors;
+        size_t sector_in_cluster = sector_num % s->cluster_sectors;
+        size_t n_sectors = s->cluster_sectors - sector_in_cluster;
+        if (n_sectors > nb_sectors) {
+            n_sectors = nb_sectors;
+        }
+        n_bytes = n_sectors * SECTOR_SIZE;
+        blockmap_entry = s->blockmap[block_index];
+        if (blockmap_entry == UINT32_MAX) {
+            /* Cluster not allocated, return zeros. */
+            memset(buf, 0, n_bytes);
+        } else {
+            uint64_t offset = (uint64_t)s->header.offset_data +
+                (uint64_t)blockmap_entry * s->cluster_size +
+                sector_in_cluster * SECTOR_SIZE;
+            if (bdrv_pread(s->hd, offset, buf, n_bytes) != n_bytes) {
+                logout("read error\n");
+                return -1;
+            }
+        }
+        buf += n_bytes;
+        sector_num += n_sectors;
+        nb_sectors -= n_sectors;
+    }
+    return 0;
+}
+
+#if defined(CONFIG_VDI_WRITE)
+static int vdi_write(BlockDriverState *bs, int64_t sector_num,
+                     const uint8_t *buf, int nb_sectors)
+{
+    BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
+    logout("%p, %" PRId64 ", %p, %d\n", bs, sector_num, buf, nb_sectors);
+    if (sector_num < 0) {
+        logout("unsupported sector %" PRId64 "\n", sector_num);
+        return -1;
+    }
+    while (nb_sectors > 0 && sector_num < bs->total_sectors) {
+        size_t n_bytes;
+        uint32_t blockmap_entry;
+        uint64_t offset;
+        size_t block_index = sector_num / s->cluster_sectors;
+        size_t sector_in_cluster = sector_num % s->cluster_sectors;
+        size_t n_sectors = s->cluster_sectors - sector_in_cluster;
+        if (n_sectors > nb_sectors) {
+            n_sectors = nb_sectors;
+        }
+        n_bytes = n_sectors * SECTOR_SIZE;
+        blockmap_entry = s->blockmap[block_index];
+        if (blockmap_entry == UINT32_MAX) {
+            /* Allocate new cluster and write to it. */
+            uint8_t *block;
+            blockmap_entry =
+            s->blockmap[block_index] = s->header.blocks_allocated;
+            s->header.blocks_allocated++;
+            offset = (uint64_t)s->header.offset_data +
+                (uint64_t)blockmap_entry * s->cluster_size;
+            block = qemu_mallocz(s->cluster_size);
+            memcpy(block + sector_in_cluster * SECTOR_SIZE, buf, n_bytes);
+            n_bytes = s->cluster_size;
+            if (bdrv_pwrite(s->hd, offset, block, n_bytes) != n_bytes) {
+                qemu_free(block);
+                logout("write error\n");
+                return -1;
+            }
+            qemu_free(block);
+            /* Write modified sector from block map. */
+            blockmap_entry &= ~(SECTOR_SIZE / sizeof(uint32_t) - 1);
+            offset = (s->header.offset_blockmap +
+                      blockmap_entry * sizeof(uint32_t));
+            if (bdrv_pwrite(s->hd, offset,
+                            &s->blockmap[blockmap_entry],
+                            SECTOR_SIZE) != SECTOR_SIZE) {
+                logout("write error\n");
+                return -1;
+            }
+        } else {
+            /* Write to existing block. */
+            offset = (uint64_t)s->header.offset_data +
+                (uint64_t)blockmap_entry * s->cluster_size +
+                sector_in_cluster * SECTOR_SIZE;
+            if (bdrv_pwrite(s->hd, offset, buf, n_bytes) != n_bytes) {
+                logout("write error\n");
+                return -1;
+            }
+        }
+        buf += n_bytes;
+        sector_num += n_sectors;
+        nb_sectors -= n_sectors;
+    }
+    return 0;
+}
+#endif
+
+static int vdi_create(const char *filename, QEMUOptionParameter *options)
+{
+    int fd;
+    uint64_t bytes = 0;
+    uint32_t clusters;
+    //~ int flags = 0;
+    size_t cluster_size = 1 * MiB;
+    VdiHeader header;
+    size_t i;
+    size_t blockmap_size;
+    uint32_t *blockmap;
+
+    logout("\n");
+
+    /* Read out options. */
+    while (options && options->name) {
+        if (!strcmp(options->name, BLOCK_OPT_SIZE)) {
+            bytes = options->value.n;
+#if defined(CONFIG_VDI_CLUSTER_SIZE)
+        } else if (!strcmp(options->name, BLOCK_OPT_CLUSTER_SIZE)) {
+            if (options->value.n) {
+                /* TODO: Additional checks (SECTOR_SIZE * 2^n, ...). */
+                cluster_size = options->value.n;
+            }
+#endif
+        }
+        options++;
+    }
+
+    fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE,
+              0644);
+    if (fd < 0) {
+        return -1;
+    }
+
+    clusters = bytes / cluster_size;
+    blockmap_size = clusters * sizeof(uint32_t);
+    blockmap_size = ((blockmap_size + SECTOR_SIZE - 1) & ~(SECTOR_SIZE -1));
+
+    memset(&header, 0, sizeof(header));
+    strcpy(header.text, VDI_TEXT);
+    header.signature = VDI_SIGNATURE;
+    header.version = VDI_VERSION_1_1;
+    header.header_size = 0x180;
+    header.image_type = VDI_TYPE_DYNAMIC;
+    header.offset_blockmap = 0x200;
+    header.offset_data = 0x200 + blockmap_size;
+    header.sector_size = SECTOR_SIZE;
+    header.disk_size = bytes;
+    header.block_size = cluster_size;
+    header.blocks_in_image = clusters;
+#if defined(HAVE_UUID_H)
+    uuid_generate(header.uuid_image);
+    uuid_generate(header.uuid_last_snap);
+#if 0
+    uuid_generate(header.uuid_link);
+    uuid_generate(header.uuid_parent);
+#endif
+#endif
+    vdi_header_print(&header);
+    vdi_header_to_le(&header);
+    write(fd, &header, sizeof(header));
+
+    blockmap = (uint32_t *)qemu_mallocz(blockmap_size);
+    for (i = 0; i < clusters; i++) {
+        blockmap[i] = UINT32_MAX;
+    }
+    write(fd, blockmap, blockmap_size);
+    qemu_free(blockmap);
+
+    close(fd);
+
+    return 0;
+}
+
+static void vdi_close(BlockDriverState *bs)
+{
+    BDRVVdiState *s = bs->opaque;
+    logout("\n");
+    bdrv_delete(s->hd);
+}
+
+static void vdi_flush(BlockDriverState *bs)
+{
+    BDRVVdiState *s = bs->opaque;
+    logout("\n");
+    bdrv_flush(s->hd);
+}
+
+
+static QEMUOptionParameter vdi_create_options[] = {
+    {
+        .name = BLOCK_OPT_SIZE,
+        .type = OPT_SIZE,
+        .help = "Virtual disk size"
+    },
+#if defined(CONFIG_VDI_CLUSTER_SIZE)
+    {
+        .name = BLOCK_OPT_CLUSTER_SIZE,
+        .type = OPT_SIZE,
+        .help = "vdi cluster size"
+    },
+#endif
+    { NULL }
+};
+
+static BlockDriver bdrv_vdi = {
+    .format_name        = "vdi",
+    .instance_size      = sizeof(BDRVVdiState),
+    .bdrv_probe         = vdi_probe,
+    .bdrv_open          = vdi_open,
+    .bdrv_close         = vdi_close,
+    .bdrv_create        = vdi_create,
+    .bdrv_flush         = vdi_flush,
+#if defined(CONFIG_VDI_UNSUPPORTED)
+    .bdrv_getlength     = vdi_getlength,
+#endif
+    .bdrv_is_allocated  = vdi_is_allocated,
+#if defined(CONFIG_VDI_UNSUPPORTED)
+    .bdrv_set_key       = vdi_set_key,
+#endif
+    .bdrv_make_empty    = vdi_make_empty,
+
+#if defined(CONFIG_VDI_UNSUPPORTED)
+    .bdrv_aio_readv     = vdi_aio_readv,
+    .bdrv_aio_writev    = vdi_aio_writev,
+    .bdrv_write_compressed = vdi_write_compressed,
+#endif
+
+    .bdrv_read          = vdi_read,
+#if defined(CONFIG_VDI_WRITE)
+    .bdrv_write         = vdi_write,
+#endif
+
+#if defined(CONFIG_VDI_SNAPSHOT)
+    .bdrv_snapshot_create   = vdi_snapshot_create,
+    .bdrv_snapshot_goto     = vdi_snapshot_goto,
+    .bdrv_snapshot_delete   = vdi_snapshot_delete,
+    .bdrv_snapshot_list     = vdi_snapshot_list,
+#endif
+    .bdrv_get_info      = vdi_get_info,
+
+#if defined(CONFIG_VDI_UNSUPPORTED)
+    .bdrv_put_buffer    = vdi_put_buffer,
+    .bdrv_get_buffer    = vdi_get_buffer,
+#endif
+
+    .create_options     = vdi_create_options,
+    .bdrv_check         = vdi_check,
+};
+
+static void bdrv_vdi_init(void)
+{
+    logout("\n");
+    bdrv_register(&bdrv_vdi);
+}
+
+block_init(bdrv_vdi_init);
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] Add new block driver for the VDI format
  2009-07-03 19:29   ` [Qemu-devel] [PATCH] Add new block driver for the VDI format Stefan Weil
@ 2009-07-05  8:05     ` Christoph Hellwig
  2009-07-05 14:02       ` Stefan Weil
  2009-07-05 14:44     ` Kevin Wolf
  1 sibling, 1 reply; 44+ messages in thread
From: Christoph Hellwig @ 2009-07-05  8:05 UTC (permalink / raw)
  To: Stefan Weil; +Cc: QEMU Developers

On Fri, Jul 03, 2009 at 09:29:46PM +0200, Stefan Weil wrote:
> +/* Enable debug messages. */
> +//~ #define CONFIG_VDI_DEBUG
> +
> +/* Support experimental write operations on VDI images. */
> +#define CONFIG_VDI_WRITE
> +
> +/* Support snapshot images. */
> +//~ #define CONFIG_VDI_SNAPSHOT
> +
> +/* Enable (currently) unsupported features. */
> +//~ #define CONFIG_VDI_UNSUPPORTED
> +
> +/* Support non-standard cluster (block) size. */
> +//~ #define CONFIG_VDI_CLUSTER_SIZE

I don't think we should keep these defines (except for the debug one)
around. CONFIG_VDI_UNSUPPORTED adds methods to the method table that
aren't actually implemented so the code will fail to compile if it's
set.  Similar for CONFIG_VDI_SNAPSHOT except that it implements a single
useless stub.  CONFIG_VDI_CLUSTER_SIZE just adds a harmless option
so it should just be unconditional, too.

I also don't see a reason for the CONFIG_VDI_WRITE ifdef as it's
apparently good enough to be enable by default.

> +static int vdi_check(BlockDriverState *bs)
> +{
> +    /* TODO: missing code. */
> +    logout("\n");
> +    return -ENOTSUP;
> +}

No need to implement this, not having the method gives the same result.

> +static int vdi_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
> +{
> +    /* TODO: unchecked code. */
> +    BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
> +    logout("\n");
> +    bdi->cluster_size = s->cluster_size;
> +    bdi->vm_state_offset = -1;
> +    return -ENOTSUP;
> +}

If you return a negative value the result is ignored, so either at least
implement a stub one or just leave out the method.

> +static int vdi_make_empty(BlockDriverState *bs)
> +{
> +    /* TODO: missing code. */
> +    logout("\n");
> +    return -ENOTSUP;
> +}

Again, no need to implement an empty method here.

> +    /* Performance is terrible right now with cache=writethrough due mainly
> +     * to reference count updates.  If the user does not explicitly specify
> +     * a caching type, force to writeback caching.
> +     * TODO: This was copied from qcow2.c, maybe it is true for vdi, too.
> +     */
> +    if ((flags & BDRV_O_CACHE_DEF)) {
> +        flags |= BDRV_O_CACHE_WB;
> +        flags &= ~BDRV_O_CACHE_DEF;
> +    }

And it looks like we're going to change it for qcow2, too..

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] Add new block driver for the VDI format
  2009-07-05  8:05     ` Christoph Hellwig
@ 2009-07-05 14:02       ` Stefan Weil
  2009-07-06 10:25         ` Christoph Hellwig
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Weil @ 2009-07-05 14:02 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: QEMU Developers

Christoph Hellwig schrieb:
> On Fri, Jul 03, 2009 at 09:29:46PM +0200, Stefan Weil wrote:
>> +/* Enable debug messages. */
>> +//~ #define CONFIG_VDI_DEBUG
>> +
>> +/* Support experimental write operations on VDI images. */
>> +#define CONFIG_VDI_WRITE
>> +
>> +/* Support snapshot images. */
>> +//~ #define CONFIG_VDI_SNAPSHOT
>> +
>> +/* Enable (currently) unsupported features. */
>> +//~ #define CONFIG_VDI_UNSUPPORTED
>> +
>> +/* Support non-standard cluster (block) size. */
>> +//~ #define CONFIG_VDI_CLUSTER_SIZE
>
> I don't think we should keep these defines (except for the debug one)
> around. CONFIG_VDI_UNSUPPORTED adds methods to the method table that
> aren't actually implemented so the code will fail to compile if it's
> set. Similar for CONFIG_VDI_SNAPSHOT except that it implements a single
> useless stub. CONFIG_VDI_CLUSTER_SIZE just adds a harmless option
> so it should just be unconditional, too.
>
> I also don't see a reason for the CONFIG_VDI_WRITE ifdef as it's
> apparently good enough to be enable by default.
>

CONFIG_VDI_UNSUPPORTED and CONFIG_VDI_SNAPSHOT document
code parts which are still missing or unfinished.
For the same reason, they are undefined, so the unfinished
code is deactivated.

CONFIG_VDI_CLUSTER_SIZE is a harmless option but its code
is unfinished, too. For this reason, it is deactivated by
default.

CONFIG_VDI_WRITE is enabled by default. Users who want to disable
writes can use it to do so.

>> +static int vdi_check(BlockDriverState *bs)
>> +{
>> +    /* TODO: missing code. */
>> +    logout("\n");
>> +    return -ENOTSUP;
>> +}
>
> No need to implement this, not having the method gives the same result.

Not having the method would hide the fact that the
method might be implemented.

vdi_check is unfinished code, and there is even a comment
which says that there remains something to do.

>
>> +static int vdi_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
>> +{
>> +    /* TODO: unchecked code. */
>> +    BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
>> +    logout("\n");
>> +    bdi->cluster_size = s->cluster_size;
>> +    bdi->vm_state_offset = -1;
>> +    return -ENOTSUP;
>> +}
>
> If you return a negative value the result is ignored, so either at least
> implement a stub one or just leave out the method.

Again this function exists to document an open question.
If someone can say that the function is complete, the TODO
comment will be removed and it will return zero.

As long as I don't know that it works, it would be dangerous
to return success.
>> +static int vdi_make_empty(BlockDriverState *bs)
>> +{
>> +    /* TODO: missing code. */
>> +    logout("\n");
>> +    return -ENOTSUP;
>> +}
>
> Again, no need to implement an empty method here.

There is the same need for this function like for vdi_check:
the function is just a hook to implement more code.

>
>> +    /* Performance is terrible right now with cache=writethrough due mainly
>> +     * to reference count updates.  If the user does not explicitly specify
>> +     * a caching type, force to writeback caching.
>> +     * TODO: This was copied from qcow2.c, maybe it is true for vdi, too.
>> +     */
>> +    if ((flags & BDRV_O_CACHE_DEF)) {
>> +        flags |= BDRV_O_CACHE_WB;
>> +        flags &= ~BDRV_O_CACHE_DEF;
>> +    }
>
> And it looks like we're going to change it for qcow2, too..

Therefore it was marked with a TODO comment.

To summarize my answer: the code is complete enough to be useful
(create / read / write are implemented). It can be further extended
and optimized, and therefore there are TODO comments and code parts
which show those incomplete or missing parts.

I am glad you had no real objections, so I hope the patches can soon
be commited.

By the way - is it possible to check new block drivers like this one
using qemu-io (can I use an existing test sequence)?

Regards,

Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] Add new block driver for the VDI format
  2009-07-03 19:29   ` [Qemu-devel] [PATCH] Add new block driver for the VDI format Stefan Weil
  2009-07-05  8:05     ` Christoph Hellwig
@ 2009-07-05 14:44     ` Kevin Wolf
  1 sibling, 0 replies; 44+ messages in thread
From: Kevin Wolf @ 2009-07-05 14:44 UTC (permalink / raw)
  To: qemu-devel

Hi,

Am Freitag, 3. Juli 2009 21:29 schrieb Stefan Weil:
> This is a new block driver written from scratch
> to support the VDI format in QEMU.
>
> VDI is the native format used by Innotek / SUN VirtualBox.
>
> Signed-off-by: Stefan Weil <weil@mail.berlios.de>

I think it would be a good thing to have a VDI driver, so let's make this 
thing ready for inclusion.

Christoph has already made some comments with which I fully agree. If you 
don't implement something in a useful way, just leave it out for now.

> ---
>  Makefile    |    4 +-
>  block/vdi.c |  598
> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files
> changed, 600 insertions(+), 2 deletions(-)
>  create mode 100644 block/vdi.c
>
> diff --git a/Makefile b/Makefile
> index b6bb41a..56815c7 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -67,8 +67,8 @@ recurse-all: $(SUBDIR_RULES)
>  #######################################################################
>  # block-obj-y is code used by both qemu system emulation and qemu-img
>
> -block-obj-y = cutils.o cache-utils.o qemu-malloc.o qemu-option.o module.o
> -block-obj-y += block/cow.o block/qcow.o aes.o block/vmdk.o block/cloop.o
> +block-obj-y = cutils.o cache-utils.o qemu-malloc.o qemu-option.o aes.o
> module.o +block-obj-y += block/cow.o block/qcow.o block/vdi.o block/vmdk.o
> block/cloop.o block-obj-y += block/dmg.o block/bochs.o block/vpc.o
> block/vvfat.o block-obj-y += block/qcow2.o block/qcow2-refcount.o
> block/qcow2-cluster.o block-obj-y += block/qcow2-snapshot.o
> diff --git a/block/vdi.c b/block/vdi.c
> new file mode 100644
> index 0000000..78e223c
> --- /dev/null
> +++ b/block/vdi.c
> @@ -0,0 +1,598 @@
> +/*
> + * Block driver for the Virtual Disk Image (VDI) format
> + *
> + * Copyright (c) 2009 Stefan Weil
> + *
> + * This program is free software: you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation, either version 2 of the License, or
> + * (at your option) version 3 or any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + *
> + * Reference:
> + * http://forums.virtualbox.org/viewtopic.php?t=8046
> + *
> + * This driver supports create / read / write operations on VDI images.
> + *
> + * Some features like snapshots are still missing (see TODO in code).
> + * Deallocation of zero-filled clusters is missing, too
> + * (might be added to common block layer).
> + * Asynchronous read / write support could be added, too.
> + */

All other block drivers are MIT licensed. You said that you wrote this driver 
from scratch, so is there any reason not to do the same here?

> +
> +#include "qemu-common.h"
> +#include "block_int.h"
> +#include "module.h"
> +
> +#if defined(HAVE_UUID_H)
> +#include <uuid/uuid.h>
> +#endif
> +
> +/* Enable debug messages. */
> +//~ #define CONFIG_VDI_DEBUG
> +
> +/* Support experimental write operations on VDI images. */
> +#define CONFIG_VDI_WRITE
> +
> +/* Support snapshot images. */
> +//~ #define CONFIG_VDI_SNAPSHOT
> +
> +/* Enable (currently) unsupported features. */
> +//~ #define CONFIG_VDI_UNSUPPORTED
> +
> +/* Support non-standard cluster (block) size. */
> +//~ #define CONFIG_VDI_CLUSTER_SIZE
> +
> +#define KiB     1024
> +#define MiB     (KiB * KiB)
> +
> +#if defined(CONFIG_VDI_DEBUG)
> +#define logout(fmt, ...) \
> +                fprintf(stderr, "vdi\t%-24s" fmt, __func__, ##__VA_ARGS__)
> +#else
> +#define logout(fmt, ...) ((void)0)
> +#endif
> +
> +#define SECTOR_SIZE 512
> +
> +/* Image signature. */
> +#define VDI_SIGNATURE 0xbeda107f
> +
> +/* Image version. */
> +#define VDI_VERSION_1_1 0x00010001
> +
> +/* Image type. */
> +#define VDI_TYPE_DYNAMIC 1
> +#define VDI_TYPE_FIXED  2
> +
> +/* Innotek / SUN images use these strings in header.text:
> + * "<<< innotek VirtualBox Disk Image >>>\n"
> + * "<<< Sun xVM VirtualBox Disk Image >>>\n"
> + * "<<< Sun VirtualBox Disk Image >>>\n"
> + * The value does not matter, so QEMU created images use a different text.
> + */
> +#define VDI_TEXT "<<< QEMU VM Virtual Disk Image >>>\n"
> +
> +#if !defined(HAVE_UUID_H)
> +typedef unsigned char uuid_t[16];
> +#endif
> +
> +typedef struct {
> +    char text[0x40];
> +    uint32_t signature;
> +    uint32_t version;
> +    uint32_t header_size;
> +    uint32_t image_type;
> +    uint32_t image_flags;
> +    char description[256];
> +    uint32_t offset_blockmap;
> +    uint32_t offset_data;
> +    uint32_t cylinders;         /* disk geometry, unused here */
> +    uint32_t heads;             /* disk geometry, unused here */
> +    uint32_t sectors;           /* disk geometry, unused here */
> +    uint32_t sector_size;
> +    uint32_t unused1;
> +    uint64_t disk_size;
> +    uint32_t block_size;
> +    uint32_t block_extra;       /* unused here */
> +    uint32_t blocks_in_image;
> +    uint32_t blocks_allocated;
> +    uuid_t uuid_image;
> +    uuid_t uuid_last_snap;
> +    uuid_t uuid_link;
> +    uuid_t uuid_parent;
> +    uint64_t unused2[7];
> +} VdiHeader;
> +
> +typedef struct BDRVVdiState {
> +    BlockDriverState *hd;
> +    uint32_t *blockmap;
> +    /* Size of cluster (bytes). */
> +    uint32_t cluster_size;
> +    /* Size of cluster (sectors). */
> +    uint32_t cluster_sectors;
> +    VdiHeader header;
> +} BDRVVdiState;
> +
> +static void vdi_header_to_cpu(VdiHeader *header)
> +{
> +    le32_to_cpus(&header->signature);
> +    le32_to_cpus(&header->version);
> +    le32_to_cpus(&header->header_size);
> +    le32_to_cpus(&header->image_type);
> +    le32_to_cpus(&header->image_flags);
> +    le32_to_cpus(&header->offset_blockmap);
> +    le32_to_cpus(&header->offset_data);
> +    le32_to_cpus(&header->cylinders);
> +    le32_to_cpus(&header->heads);
> +    le32_to_cpus(&header->sectors);
> +    le32_to_cpus(&header->sector_size);
> +    le64_to_cpus(&header->disk_size);
> +    le32_to_cpus(&header->block_size);
> +    le32_to_cpus(&header->block_extra);
> +    le32_to_cpus(&header->blocks_in_image);
> +    le32_to_cpus(&header->blocks_allocated);
> +}
> +
> +static void vdi_header_to_le(VdiHeader *header)
> +{
> +    cpu_to_le32s(&header->signature);
> +    cpu_to_le32s(&header->version);
> +    cpu_to_le32s(&header->header_size);
> +    cpu_to_le32s(&header->image_type);
> +    cpu_to_le32s(&header->image_flags);
> +    cpu_to_le32s(&header->offset_blockmap);
> +    cpu_to_le32s(&header->offset_data);
> +    cpu_to_le32s(&header->cylinders);
> +    cpu_to_le32s(&header->heads);
> +    cpu_to_le32s(&header->sectors);
> +    cpu_to_le32s(&header->sector_size);
> +    cpu_to_le64s(&header->disk_size);
> +    cpu_to_le32s(&header->block_size);
> +    cpu_to_le32s(&header->block_extra);
> +    cpu_to_le32s(&header->blocks_in_image);
> +    cpu_to_le32s(&header->blocks_allocated);
> +}
> +
> +static void vdi_header_print(VdiHeader *header)
> +{
> +    logout("text        %s", header->text);
> +    logout("signature   0x%04x\n", header->signature);
> +    logout("header size 0x%04x\n", header->header_size);
> +    logout("image type  0x%04x\n", header->image_type);
> +    logout("image flags 0x%04x\n", header->image_flags);
> +    logout("description %s\n", header->description);
> +    logout("offset bmap 0x%04x\n", header->offset_blockmap);
> +    logout("offset data 0x%04x\n", header->offset_data);
> +    logout("cylinders   0x%04x\n", header->cylinders);
> +    logout("heads       0x%04x\n", header->heads);
> +    logout("sectors     0x%04x\n", header->sectors);
> +    logout("sector size 0x%04x\n", header->sector_size);
> +    logout("image size  0x%" PRIx64 " B (%" PRIu64 " MiB)\n",
> +           header->disk_size, header->disk_size / MiB);
> +    logout("block size  0x%04x\n", header->block_size);
> +    logout("block extra 0x%04x\n", header->block_extra);
> +    logout("blocks tot. 0x%04x\n", header->blocks_in_image);
> +    logout("blocks all. 0x%04x\n", header->blocks_allocated);
> +}
> +
> +static int vdi_check(BlockDriverState *bs)
> +{
> +    /* TODO: missing code. */
> +    logout("\n");
> +    return -ENOTSUP;
> +}
> +
> +static int vdi_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
> +{
> +    /* TODO: unchecked code. */
> +    BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
> +    logout("\n");
> +    bdi->cluster_size = s->cluster_size;
> +    bdi->vm_state_offset = -1;
> +    return -ENOTSUP;
> +}
> +
> +static int vdi_make_empty(BlockDriverState *bs)
> +{
> +    /* TODO: missing code. */
> +    logout("\n");
> +    return -ENOTSUP;
> +}
> +
> +static int vdi_probe(const uint8_t *buf, int buf_size, const char
> *filename) +{
> +    const VdiHeader *header = (const VdiHeader *)buf;
> +    int result = 0;
> +
> +    if (buf_size < sizeof(*header)) {
> +        /* Header too small, no VDI. */
> +    } else if (le32_to_cpu(header->signature) == VDI_SIGNATURE) {
> +        result = 100;
> +    }
> +
> +    if (result == 0) {
> +        logout("no vdi image\n");
> +    } else {
> +        logout("%s", header->text);
> +    }
> +
> +    return result;
> +}
> +
> +#if defined(CONFIG_VDI_SNAPSHOT)
> +static int vdi_snapshot_create(const char *filename, const char
> *backing_file) +{
> +    /* TODO: missing code. */
> +    logout("\n");
> +    return -1;
> +}
> +#endif
> +
> +static int vdi_open(BlockDriverState *bs, const char *filename, int flags)
> +{
> +    BDRVVdiState *s = bs->opaque;
> +    VdiHeader header;
> +    size_t blockmap_size;
> +    int ret;
> +
> +    logout("\n");
> +
> +    /* Performance is terrible right now with cache=writethrough due
> mainly +     * to reference count updates.  If the user does not explicitly
> specify +     * a caching type, force to writeback caching.
> +     * TODO: This was copied from qcow2.c, maybe it is true for vdi, too.
> +     */
> +    if ((flags & BDRV_O_CACHE_DEF)) {
> +        flags |= BDRV_O_CACHE_WB;
> +        flags &= ~BDRV_O_CACHE_DEF;
> +    }

You're including a workaround for a problem for which you don't even know if 
it exists? You are not serious.

This workaround is going to be removed even for qcow2 next time Anthony 
flushes his patch queue because things have improved (and BDRV_O_CACHE_DEF 
will be gone then, btw). If you're doing the VDI block driver right, you 
won't need it.

> +
> +    ret = bdrv_file_open(&s->hd, filename, flags);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    if (bdrv_pread(s->hd, 0, &header, sizeof(header)) != sizeof(header)) {
> +        goto fail;
> +    }
> +
> +    vdi_header_to_cpu(&header);
> +    vdi_header_print(&header);
> +
> +    if (header.version != VDI_VERSION_1_1) {
> +        logout("unsupported version %u.%u\n",
> +               header.version >> 16, header.version & 0xffff);
> +        goto fail;
> +    } else if (header.offset_blockmap % SECTOR_SIZE != 0) {
> +        /* We only support blockmaps which start on a sector boundary. */
> +        logout("unsupported blockmap offset 0x%x B\n",
> header.offset_blockmap); +        goto fail;
> +    } else if (header.offset_data % SECTOR_SIZE != 0) {
> +        /* We only support data blocks which start on a sector boundary.
> */ +        logout("unsupported data offset 0x%x B\n", header.offset_data);
> +        goto fail;
> +    } else if (header.sector_size != SECTOR_SIZE) {
> +        logout("unsupported sector size %u B\n", header.sector_size);
> +        goto fail;
> +    } else if (header.block_size != 1 * MiB) {
> +        logout("unsupported block size %u B\n", header.block_size);
> +        goto fail;
> +    } else if (header.disk_size !=
> +               (uint64_t)header.blocks_in_image * header.block_size) {
> +        logout("unexpected block number %u B\n", header.blocks_in_image);
> +        goto fail;
> +    }
> +
> +    bs->total_sectors = header.disk_size / SECTOR_SIZE;
> +
> +    blockmap_size = header.blocks_in_image * sizeof(uint32_t);
> +    s->blockmap = qemu_malloc(blockmap_size);
> +    if (bdrv_pread(s->hd, header.offset_blockmap, s->blockmap,
> blockmap_size) != blockmap_size) { +        goto fail_free_blockmap;
> +    }

The block map doesn't have an endianess? You don't seem to convert it here nor 
when you use it in the read/write functions below.

> +
> +    /* Blocks (VDI documentation) correspond to clusters (QEMU). */
> +    s->cluster_size = header.block_size;
> +    s->cluster_sectors = (header.block_size / SECTOR_SIZE);

If "blocks" is what they are called officially, I would just stick to this 
name to avoid confusion. "cluster" is not a term used throughout qemu but a 
concept of qcow2.

> +    s->header = header;
> +    logout("cluster size %u KiB\n", s->cluster_size / KiB);
> +
> +    return 0;
> +
> + fail_free_blockmap:
> +    qemu_free(s->blockmap);
> +
> + fail:
> +    bdrv_delete(s->hd);
> +    return -1;
> +}
> +
> +static int vdi_is_allocated(BlockDriverState *bs, int64_t sector_num,
> +                             int nb_sectors, int *pnum)
> +{
> +    /* TODO: Check for too large sector_num (in bdrv_is_allocated or
> here). */ +    BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
> +    size_t blockmap_index = sector_num / s->cluster_sectors;
> +    size_t sector_in_cluster = sector_num % s->cluster_sectors;
> +    int n_sectors = s->cluster_sectors - sector_in_cluster;
> +    uint32_t cluster_index = s->blockmap[blockmap_index];
> +    logout("%p, %" PRId64 ", %d, %p\n", bs, sector_num, nb_sectors, pnum);
> +    if (n_sectors > nb_sectors) {
> +        n_sectors = nb_sectors;
> +    }
> +    *pnum = n_sectors;
> +    return cluster_index != UINT32_MAX;
> +}
> +
> +static int vdi_read(BlockDriverState *bs, int64_t sector_num,
> +                    uint8_t *buf, int nb_sectors)
> +{
> +    BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
> +    logout("%p, %" PRId64 ", %p, %d\n", bs, sector_num, buf, nb_sectors);
> +    if (sector_num < 0) {
> +        logout("unsupported sector %" PRId64 "\n", sector_num);
> +        return -1;
> +    }
> +    while (nb_sectors > 0 && sector_num < bs->total_sectors) {
> +        size_t n_bytes;
> +        uint32_t blockmap_entry;
> +        size_t block_index = sector_num / s->cluster_sectors;
> +        size_t sector_in_cluster = sector_num % s->cluster_sectors;
> +        size_t n_sectors = s->cluster_sectors - sector_in_cluster;
> +        if (n_sectors > nb_sectors) {
> +            n_sectors = nb_sectors;
> +        }
> +        n_bytes = n_sectors * SECTOR_SIZE;
> +        blockmap_entry = s->blockmap[block_index];
> +        if (blockmap_entry == UINT32_MAX) {
> +            /* Cluster not allocated, return zeros. */
> +            memset(buf, 0, n_bytes);
> +        } else {
> +            uint64_t offset = (uint64_t)s->header.offset_data +
> +                (uint64_t)blockmap_entry * s->cluster_size +
> +                sector_in_cluster * SECTOR_SIZE;
> +            if (bdrv_pread(s->hd, offset, buf, n_bytes) != n_bytes) {

What about using bdrv_read with sector numbers? bdrv_pread/pwrite always look 
suspicious to me because they are emulated in the non-aligned case (you seem 
to have everything aligned though, so this is purely cosmetical).

> +                logout("read error\n");
> +                return -1;
> +            }
> +        }
> +        buf += n_bytes;
> +        sector_num += n_sectors;
> +        nb_sectors -= n_sectors;
> +    }
> +    return 0;
> +}
> +
> +#if defined(CONFIG_VDI_WRITE)
> +static int vdi_write(BlockDriverState *bs, int64_t sector_num,
> +                     const uint8_t *buf, int nb_sectors)
> +{
> +    BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
> +    logout("%p, %" PRId64 ", %p, %d\n", bs, sector_num, buf, nb_sectors);
> +    if (sector_num < 0) {
> +        logout("unsupported sector %" PRId64 "\n", sector_num);
> +        return -1;
> +    }
> +    while (nb_sectors > 0 && sector_num < bs->total_sectors) {
> +        size_t n_bytes;
> +        uint32_t blockmap_entry;
> +        uint64_t offset;
> +        size_t block_index = sector_num / s->cluster_sectors;
> +        size_t sector_in_cluster = sector_num % s->cluster_sectors;
> +        size_t n_sectors = s->cluster_sectors - sector_in_cluster;
> +        if (n_sectors > nb_sectors) {
> +            n_sectors = nb_sectors;
> +        }
> +        n_bytes = n_sectors * SECTOR_SIZE;
> +        blockmap_entry = s->blockmap[block_index];
> +        if (blockmap_entry == UINT32_MAX) {
> +            /* Allocate new cluster and write to it. */
> +            uint8_t *block;
> +            blockmap_entry =
> +            s->blockmap[block_index] = s->header.blocks_allocated;
> +            s->header.blocks_allocated++;
> +            offset = (uint64_t)s->header.offset_data +
> +                (uint64_t)blockmap_entry * s->cluster_size;
> +            block = qemu_mallocz(s->cluster_size);
> +            memcpy(block + sector_in_cluster * SECTOR_SIZE, buf, n_bytes);
> +            n_bytes = s->cluster_size;
> +            if (bdrv_pwrite(s->hd, offset, block, n_bytes) != n_bytes) {

Here again, take care of the endianess.

Kevin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] Add new block driver for the VDI format
  2009-07-05 14:02       ` Stefan Weil
@ 2009-07-06 10:25         ` Christoph Hellwig
  2009-07-06 17:19           ` Stefan Weil
  0 siblings, 1 reply; 44+ messages in thread
From: Christoph Hellwig @ 2009-07-06 10:25 UTC (permalink / raw)
  To: Stefan Weil; +Cc: Christoph Hellwig, QEMU Developers

On Sun, Jul 05, 2009 at 04:02:29PM +0200, Stefan Weil wrote:
> CONFIG_VDI_UNSUPPORTED and CONFIG_VDI_SNAPSHOT document
> code parts which are still missing or unfinished.
> For the same reason, they are undefined, so the unfinished
> code is deactivated.

> Not having the method would hide the fact that the
> method might be implemented.
> 
> vdi_check is unfinished code, and there is even a comment
> which says that there remains something to do.

Keeping stubs around as a reminder is very bad coding practice.  You
already have a todo list reminding about the missing features on the
top of the file.  Note that the feature set of your vdi driver is the
same as all the other non-native image format drivers, so it's not
really anything special anyway.

> By the way - is it possible to check new block drivers like this one
> using qemu-io (can I use an existing test sequence)?

I've put support into qemu-iotests to run with the vdi format.  It
passes all test that currently are available for vdi.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] RFC: Add new block driver for the VDI format
  2009-07-03 19:24 [Qemu-devel] [PATCH] RFC: Add new block driver for the VDI format Stefan Weil
  2009-07-03 19:29 ` [Qemu-devel] [PATCH] Check availability of uuid header / lib Stefan Weil
@ 2009-07-06 13:37 ` Anthony Liguori
  2009-07-06 21:10   ` Stefan Weil
  2009-08-02 14:27   ` Avi Kivity
  2009-07-23 15:58 ` [Qemu-devel] [PATCH] RFC: Add new block driver for the VDI format (aio version) Stefan Weil
  2 siblings, 2 replies; 44+ messages in thread
From: Anthony Liguori @ 2009-07-06 13:37 UTC (permalink / raw)
  To: Stefan Weil; +Cc: Kevin Wolf, QEMU Developers, Christoph Hellwig

Stefan Weil wrote:
> Hello,
>
> this mail will be followed by two patches which
> allow QEMU to create, read and write VDI images.
> VDI is the native image format of SUN's VirtualBox emulator.
>
> The code was written from scratch for QEMU, while earlier patches
> sent to Qemu-devel were wrappers for code from VirtualBox. See
> http://lists.gnu.org/archive/html/qemu-devel/2008-07/msg00366.html
> for those patches.
>
> Patch 1 adds uuid support to QEMU and can be used independent of
> the second patch (look for uuid in vl.c, for example).
>
> Patch 2 adds the VDI block driver. It only needs uuid support
> when a new image is created (without uuid, it will create an image
> with zeroed "uuid" values), so this second patch is semi-independent
> of the first.
>   

I'd really like to get rid of synchronous IO functions in the block 
layer.  One way to do this is to insist that all new block drivers only 
implement the AIO functions.

I think we should make this decree but I'd like to know if other people 
think this is unreasonable first.  One potential model of block drivers 
would involve synchronous IO and threads.  I'm not a big fan of that 
model and I don't think it's an easy conversion from today's synchronous 
IO drivers to that model because the locking and re-entrance needs 
careful consideration.

Since it looks like you're caching the full offset table in memory, and 
it's a single level, making it asynchronous should be very easy to do.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] Add new block driver for the VDI format
  2009-07-06 10:25         ` Christoph Hellwig
@ 2009-07-06 17:19           ` Stefan Weil
  0 siblings, 0 replies; 44+ messages in thread
From: Stefan Weil @ 2009-07-06 17:19 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: QEMU Developers

Christoph Hellwig schrieb:
> On Sun, Jul 05, 2009 at 04:02:29PM +0200, Stefan Weil wrote:
>   
>> CONFIG_VDI_UNSUPPORTED and CONFIG_VDI_SNAPSHOT document
>> code parts which are still missing or unfinished.
>> For the same reason, they are undefined, so the unfinished
>> code is deactivated.
>>     
>
>   
>> Not having the method would hide the fact that the
>> method might be implemented.
>>
>> vdi_check is unfinished code, and there is even a comment
>> which says that there remains something to do.
>>     
>
> Keeping stubs around as a reminder is very bad coding practice.  You
> already have a todo list reminding about the missing features on the
> top of the file.  Note that the feature set of your vdi driver is the
> same as all the other non-native image format drivers, so it's not
> really anything special anyway.
>   

My opinion about bad coding practices differs from yours.

Why do you think that stubs are even a very bad coding practice?

The potential feature set of the vdi driver is more than that of the
other non-native image format drivers. It is nearer to that of qcow2.

>   
>> By the way - is it possible to check new block drivers like this one
>> using qemu-io (can I use an existing test sequence)?
>>     
>
> I've put support into qemu-iotests to run with the vdi format.  It
> passes all test that currently are available for vdi.
>
>   

Thanks for this feedback. There are still errors in my first driver release
(endianess, block allocation) which will be fixed in the next release.

Regards,

Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] RFC: Add new block driver for the VDI format
  2009-07-06 13:37 ` [Qemu-devel] [PATCH] RFC: " Anthony Liguori
@ 2009-07-06 21:10   ` Stefan Weil
  2009-07-06 21:28     ` Anthony Liguori
  2009-07-07  7:55     ` Kevin Wolf
  2009-08-02 14:27   ` Avi Kivity
  1 sibling, 2 replies; 44+ messages in thread
From: Stefan Weil @ 2009-07-06 21:10 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Kevin Wolf, QEMU Developers

Anthony Liguori schrieb:
> Stefan Weil wrote:
>> Hello,
>>
>> this mail will be followed by two patches which
>> allow QEMU to create, read and write VDI images.
>> VDI is the native image format of SUN's VirtualBox emulator.
>>
>> The code was written from scratch for QEMU, while earlier patches
>> sent to Qemu-devel were wrappers for code from VirtualBox. See
>> http://lists.gnu.org/archive/html/qemu-devel/2008-07/msg00366.html
>> for those patches.
>>
>> Patch 1 adds uuid support to QEMU and can be used independent of
>> the second patch (look for uuid in vl.c, for example).
>>
>> Patch 2 adds the VDI block driver. It only needs uuid support
>> when a new image is created (without uuid, it will create an image
>> with zeroed "uuid" values), so this second patch is semi-independent
>> of the first.
>>   
>
> I'd really like to get rid of synchronous IO functions in the block
> layer.  One way to do this is to insist that all new block drivers
> only implement the AIO functions.
>
> I think we should make this decree but I'd like to know if other
> people think this is unreasonable first.  One potential model of block
> drivers would involve synchronous IO and threads.  I'm not a big fan
> of that model and I don't think it's an easy conversion from today's
> synchronous IO drivers to that model because the locking and
> re-entrance needs careful consideration.
>
> Since it looks like you're caching the full offset table in memory,
> and it's a single level, making it asynchronous should be very easy to
> do.
>
> Regards,
>
> Anthony Liguori
>

Yes, at least it should be straight forward to do.
I planned to switch to AIO in a second stage.

It would help if you could already commit the synchronous version
as soon as the endianess issue (detected by Kevin Wolf, thanks)
and an additional bug in the write code are fixed
(code is ready, but still untested - a patch will follow this week).

I don't plan to change the code's license from GPL to MIT.
This is a matter of my personal taste. Is this a problem
for new block driver code?

Regards,

Stefan Weil

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] RFC: Add new block driver for the VDI format
  2009-07-06 21:10   ` Stefan Weil
@ 2009-07-06 21:28     ` Anthony Liguori
  2009-07-07  7:55     ` Kevin Wolf
  1 sibling, 0 replies; 44+ messages in thread
From: Anthony Liguori @ 2009-07-06 21:28 UTC (permalink / raw)
  To: Stefan Weil; +Cc: Kevin Wolf, QEMU Developers

Stefan Weil wrote:
> Anthony Liguori schrieb:
>   
> Yes, at least it should be straight forward to do.
> I planned to switch to AIO in a second stage.
>
> It would help if you could already commit the synchronous version
> as soon as the endianess issue (detected by Kevin Wolf, thanks)
> and an additional bug in the write code are fixed
> (code is ready, but still untested - a patch will follow this week).
>   

I'd rather not do that.  The synchronous IO functions are really broken 
in a fundamental way.

> I don't plan to change the code's license from GPL to MIT.
> This is a matter of my personal taste. Is this a problem
> for new block driver code?
>   

GPL is always fine by me.

Regards,

Anthony Liguori

> Regards,
>
> Stefan Weil
>
>   

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] RFC: Add new block driver for the VDI format
  2009-07-06 21:10   ` Stefan Weil
  2009-07-06 21:28     ` Anthony Liguori
@ 2009-07-07  7:55     ` Kevin Wolf
  2009-07-07  9:04       ` Jamie Lokier
  2009-07-07 10:30       ` Christoph Hellwig
  1 sibling, 2 replies; 44+ messages in thread
From: Kevin Wolf @ 2009-07-07  7:55 UTC (permalink / raw)
  To: Stefan Weil; +Cc: QEMU Developers

Stefan Weil schrieb:
> It would help if you could already commit the synchronous version
> as soon as the endianess issue (detected by Kevin Wolf, thanks)
> and an additional bug in the write code are fixed
> (code is ready, but still untested - a patch will follow this week).

Maybe somebody should run qemu-iotests for it on a big endian machine to
make sure it works. Unfortunately, I don't have one here.

For the write bug you might want to write a new qemu-iotests test case
as it doesn't seem to be covered yet (at least, Christoph said the
driver passes).

> I don't plan to change the code's license from GPL to MIT.
> This is a matter of my personal taste. Is this a problem
> for new block driver code?

I guess it's okay. It's just that until now the whole block code was
under a single license which I would have liked to retain. But again
just a matter of personal taste.

Kevin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] RFC: Add new block driver for the VDI format
  2009-07-07  7:55     ` Kevin Wolf
@ 2009-07-07  9:04       ` Jamie Lokier
  2009-07-07 10:30       ` Christoph Hellwig
  1 sibling, 0 replies; 44+ messages in thread
From: Jamie Lokier @ 2009-07-07  9:04 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: QEMU Developers

Kevin Wolf wrote:
> Stefan Weil schrieb:
> > It would help if you could already commit the synchronous version
> > as soon as the endianess issue (detected by Kevin Wolf, thanks)
> > and an additional bug in the write code are fixed
> > (code is ready, but still untested - a patch will follow this week).
> 
> Maybe somebody should run qemu-iotests for it on a big endian machine to
> make sure it works. Unfortunately, I don't have one here.

I've heard there's some emulator thing people are writing, for when
you don't have a real machine. :-)

-- Jamie

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] RFC: Add new block driver for the VDI format
  2009-07-07  7:55     ` Kevin Wolf
  2009-07-07  9:04       ` Jamie Lokier
@ 2009-07-07 10:30       ` Christoph Hellwig
  2009-07-07 10:33         ` Kevin Wolf
  1 sibling, 1 reply; 44+ messages in thread
From: Christoph Hellwig @ 2009-07-07 10:30 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: QEMU Developers

On Tue, Jul 07, 2009 at 09:55:57AM +0200, Kevin Wolf wrote:
> For the write bug you might want to write a new qemu-iotests test case
> as it doesn't seem to be covered yet (at least, Christoph said the
> driver passes).

Yes, the test suite passes.  But currrently the coverage for non-qcow2
image formats is a bit limited as we skip all tests that require an
image check.  I plan to change that soon and run all tests for all image
formats - a lacking qemu-img check command will simply cause the check
to always succeed.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] RFC: Add new block driver for the VDI format
  2009-07-07 10:30       ` Christoph Hellwig
@ 2009-07-07 10:33         ` Kevin Wolf
  0 siblings, 0 replies; 44+ messages in thread
From: Kevin Wolf @ 2009-07-07 10:33 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: QEMU Developers

Christoph Hellwig schrieb:
> On Tue, Jul 07, 2009 at 09:55:57AM +0200, Kevin Wolf wrote:
>> For the write bug you might want to write a new qemu-iotests test case
>> as it doesn't seem to be covered yet (at least, Christoph said the
>> driver passes).
> 
> Yes, the test suite passes.  But currrently the coverage for non-qcow2
> image formats is a bit limited as we skip all tests that require an
> image check.  I plan to change that soon and run all tests for all image
> formats - a lacking qemu-img check command will simply cause the check
> to always succeed.

Oh, I see. What you suggest for the checks makes sense to me.

Kevin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH] RFC: Add new block driver for the VDI format (aio version)
  2009-07-03 19:24 [Qemu-devel] [PATCH] RFC: Add new block driver for the VDI format Stefan Weil
  2009-07-03 19:29 ` [Qemu-devel] [PATCH] Check availability of uuid header / lib Stefan Weil
  2009-07-06 13:37 ` [Qemu-devel] [PATCH] RFC: " Anthony Liguori
@ 2009-07-23 15:58 ` Stefan Weil
  2009-07-23 20:27   ` [Qemu-devel] [PATCH] Check availability of uuid header / lib Stefan Weil
                     ` (2 more replies)
  2 siblings, 3 replies; 44+ messages in thread
From: Stefan Weil @ 2009-07-23 15:58 UTC (permalink / raw)
  To: QEMU Developers

Stefan Weil schrieb:
> Hello,
>
> this mail will be followed by two patches which
> allow QEMU to create, read and write VDI images.
> VDI is the native image format of SUN's VirtualBox emulator.
>
> The code was written from scratch for QEMU, while earlier patches
> sent to Qemu-devel were wrappers for code from VirtualBox. See
> http://lists.gnu.org/archive/html/qemu-devel/2008-07/msg00366.html
> for those patches.
>
> Patch 1 adds uuid support to QEMU and can be used independent of
> the second patch (look for uuid in vl.c, for example).
>
> Patch 2 adds the VDI block driver. It only needs uuid support
> when a new image is created (without uuid, it will create an image
> with zeroed "uuid" values), so this second patch is semi-independent
> of the first.
>
> Please test, comment and add both patches to QEMU master
> (if there are no objections).
>
> Regards
>
> Stefan Weil


The new version of the VDI block driver adds these changes:

* Fix allocation of new blocks. The old code did not update the image
header,
  so after a new program start, new allocations had overwritten old ones.
  This is something still untested by qemu-iotests.

* Fix endianess issues. I'm sorry I could not test it up to now,
  but it should work nevertheless :-)

* Support asynchronous i/o. The synchronous code is still included as a
  compile time option, but the default is asynchronous, and it works.
  I'm very sure the aio code can be improved, and some parts are
  even unnecessary (at least I think so), so if you are a
  block driver expert: please review the code and send comments.

* Support checking of VDI images. These consistency checks were very
  helpful during driver development!

* Support static images. Static images use pre-allocated blocks
  while dynamic images (default) allocate blocks on demand.


Here is an overview of the new patches:

* Patch 1 adds UUID support to QEMU. This update of the old patch was
  needed because of changes in QEMU's configure / Makefiles.
  Please commit it to QEMU master, so I don't have to fix and resend it
  when there are additional changes in these files.

* Patch 2 is the new VDI block driver. Reviews and suggestions are
  welcome.

* Patch 3 fixes qemu-io-tests to support the new VDI block driver
  (needed because of new option for static images).


The latest code is also available from http://repo.or.cz/w/qemu/ar7.git

Regards

Stefan Weil

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH] Check availability of uuid header / lib
  2009-07-23 15:58 ` [Qemu-devel] [PATCH] RFC: Add new block driver for the VDI format (aio version) Stefan Weil
@ 2009-07-23 20:27   ` Stefan Weil
  2009-07-24  6:32     ` Christoph Egger
  2009-10-01 18:10     ` [Qemu-devel] [PATCH] Check availability of uuid header / library Stefan Weil
  2009-07-23 20:29   ` [Qemu-devel] [PATCH] Add new block driver for the VDI format (use aio) Stefan Weil
  2009-07-23 20:30   ` [Qemu-devel] [PATCH] add support for new option of vdi format Stefan Weil
  2 siblings, 2 replies; 44+ messages in thread
From: Stefan Weil @ 2009-07-23 20:27 UTC (permalink / raw)
  To: QEMU Developers

The Universally Unique Identifier library will be used
for the new vdi block driver and maybe other parts of QEMU.

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
---
 Makefile        |    1 +
 Makefile.target |    2 ++
 configure       |   21 +++++++++++++++++++++
 3 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/Makefile b/Makefile
index dc95869..d8fa730 100644
--- a/Makefile
+++ b/Makefile
@@ -29,6 +29,7 @@ else
 DOCS=
 endif
 
+LIBS+=$(UUID_LIBS)
 LIBS+=$(PTHREADLIBS)
 LIBS+=$(CLOCKLIBS)
 
diff --git a/Makefile.target b/Makefile.target
index f9cd42a..4a01e96 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -469,6 +469,8 @@ ifdef CONFIG_BLUEZ
 LIBS += $(CONFIG_BLUEZ_LIBS)
 endif
 
+LIBS += $(UUID_LIBS)
+
 # xen backend driver support
 obj-$(CONFIG_XEN) += xen_machine_pv.o xen_domainbuild.o
 ifeq ($(CONFIG_XEN), y)
diff --git a/configure b/configure
index 39bed79..28a9c48 100755
--- a/configure
+++ b/configure
@@ -995,6 +995,22 @@ if $cc $ARCH_CFLAGS -o $TMPE $TMPC > /dev/null 2> /dev/null ; then
 fi
 
 ##########################################
+# uuid_generate() probe, used for vdi block driver
+uuid="no"
+cat > $TMPC << EOF
+#include <uuid/uuid.h>
+int main(void)
+{
+    uuid_t my_uuid;
+    uuid_generate(my_uuid);
+    return 0;
+}
+EOF
+if $cc $ARCH_CFLAGS -o $TMPE $TMPC -luuid >/dev/null 2>&1; then
+   uuid="yes"
+fi
+
+##########################################
 # vde libraries probe
 if test "$vde" = "yes" ; then
   cat > $TMPC << EOF
@@ -1473,6 +1489,7 @@ echo "Install blobs     $blobs"
 echo -e "KVM support       $kvm"
 echo "fdt support       $fdt"
 echo "preadv support    $preadv"
+echo "uuid support      $uuid"
 
 if test $sdl_too_old = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -1655,6 +1672,10 @@ fi
 if test "$fnmatch" = "yes" ; then
   echo "#define HAVE_FNMATCH_H 1" >> $config_host_h
 fi
+if test "$uuid" = "yes" ; then
+  echo "#define HAVE_UUID_H 1" >> $config_host_h
+  echo "UUID_LIBS=-luuid" >> $config_host_mak
+fi
 qemu_version=`head $source_path/VERSION`
 echo "VERSION=$qemu_version" >>$config_host_mak
 echo "#define QEMU_VERSION \"$qemu_version\"" >> $config_host_h
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH] Add new block driver for the VDI format (use aio)
  2009-07-23 15:58 ` [Qemu-devel] [PATCH] RFC: Add new block driver for the VDI format (aio version) Stefan Weil
  2009-07-23 20:27   ` [Qemu-devel] [PATCH] Check availability of uuid header / lib Stefan Weil
@ 2009-07-23 20:29   ` Stefan Weil
  2009-07-24  9:18     ` Kevin Wolf
  2009-07-31 15:25     ` Anthony Liguori
  2009-07-23 20:30   ` [Qemu-devel] [PATCH] add support for new option of vdi format Stefan Weil
  2 siblings, 2 replies; 44+ messages in thread
From: Stefan Weil @ 2009-07-23 20:29 UTC (permalink / raw)
  To: QEMU Developers

This is a new block driver written from scratch
to support the VDI format in QEMU.

VDI is the native format used by Innotek / SUN VirtualBox.

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
---
 Makefile      |    2 +-
 block/vdi.c   | 1105 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 qemu-img.texi |    2 +
 3 files changed, 1108 insertions(+), 1 deletions(-)
 create mode 100644 block/vdi.c

diff --git a/Makefile b/Makefile
index d8fa730..29f4a65 100644
--- a/Makefile
+++ b/Makefile
@@ -66,7 +66,7 @@ recurse-all: $(SUBDIR_RULES)
 block-obj-y = cutils.o cache-utils.o qemu-malloc.o qemu-option.o module.o
 block-obj-y += nbd.o block.o aio.o aes.o
 
-block-nested-y += cow.o qcow.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
+block-nested-y += cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
 block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
 block-nested-y += parallels.o nbd.o
 
diff --git a/block/vdi.c b/block/vdi.c
new file mode 100644
index 0000000..0432446
--- /dev/null
+++ b/block/vdi.c
@@ -0,0 +1,1105 @@
+/*
+ * Block driver for the Virtual Disk Image (VDI) format
+ *
+ * Copyright (c) 2009 Stefan Weil
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) version 3 or any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Reference:
+ * http://forums.virtualbox.org/viewtopic.php?t=8046
+ *
+ * This driver supports create / read / write operations on VDI images.
+ *
+ * Todo (see also TODO in code):
+ *
+ * Some features like snapshots are still missing.
+ *
+ * Deallocation of zero-filled blocks and shrinking images are missing, too
+ * (might be added to common block layer).
+ *
+ * Allocation of blocks could be optimized (less writes to block map and
+ * header).
+ *
+ * Read and write of adjacents blocks could be done in one operation
+ * (current code uses one operation per block (1 MiB).
+ *
+ * The code is not thread safe (missing locks for changes in header and
+ * block table, no problem with current QEMU).
+ *
+ * Hints:
+ *
+ * Blocks (VDI documentation) correspond to clusters (QEMU).
+ * QEMU's backing files could be implemented using VDI snapshot files (TODO).
+ * VDI snapshot files may also contain the complete machine state.
+ * Maybe this machine state can be converted to QEMU PC machine snapshot data.
+ *
+ * The driver keeps a block cache (little endian entries) in memory.
+ * For the standard block size (1 MiB), a terrabyte disk will use 4 MiB RAM,
+ * so this seems to be reasonable.
+ */
+
+#include "qemu-common.h"
+#include "block_int.h"
+#include "module.h"
+
+#if defined(HAVE_UUID_H)
+#include <uuid/uuid.h>
+#else
+/* TODO: move uuid emulation to some central place in QEMU. */
+#include "sysemu.h"     /* UUID_FMT */
+typedef unsigned char uuid_t[16];
+void uuid_generate(uuid_t out);
+void uuid_unparse(uuid_t uu, char *out);
+#endif
+
+/* Code configuration options. */
+
+/* Use old (synchronous) I/O. */
+//~ #undef CONFIG_AIO
+
+/* Enable debug messages. */
+//~ #define CONFIG_VDI_DEBUG
+
+/* Support write operations on VDI images. */
+#define CONFIG_VDI_WRITE
+
+/* Support snapshot images (not implemented yet). */
+//~ #define CONFIG_VDI_SNAPSHOT
+
+/* Enable (currently) unsupported features (not implemented yet). */
+//~ #define CONFIG_VDI_UNSUPPORTED
+
+/* Support non-standard block (cluster) size. */
+//~ #define CONFIG_VDI_BLOCK_SIZE
+
+/* Support static (pre-allocated) images. */
+#define CONFIG_VDI_STATIC_IMAGE
+
+/* Command line option for static images. */
+#define BLOCK_OPT_STATIC "static"
+
+#define KiB     1024
+#define MiB     (KiB * KiB)
+
+#define SECTOR_SIZE 512
+
+#if defined(CONFIG_VDI_DEBUG)
+#define logout(fmt, ...) \
+                fprintf(stderr, "vdi\t%-24s" fmt, __func__, ##__VA_ARGS__)
+#else
+#define logout(fmt, ...) ((void)0)
+#endif
+
+/* Image signature. */
+#define VDI_SIGNATURE 0xbeda107f
+
+/* Image version. */
+#define VDI_VERSION_1_1 0x00010001
+
+/* Image type. */
+#define VDI_TYPE_DYNAMIC 1
+#define VDI_TYPE_STATIC  2
+
+/* Innotek / SUN images use these strings in header.text:
+ * "<<< innotek VirtualBox Disk Image >>>\n"
+ * "<<< Sun xVM VirtualBox Disk Image >>>\n"
+ * "<<< Sun VirtualBox Disk Image >>>\n"
+ * The value does not matter, so QEMU created images use a different text.
+ */
+#define VDI_TEXT "<<< QEMU VM Virtual Disk Image >>>\n"
+
+/* Unallocated blocks use this index (no need to convert endianess). */
+#define VDI_UNALLOCATED UINT32_MAX
+
+#if !defined(HAVE_UUID_H)
+void uuid_generate(uuid_t out)
+{
+    memset(out, 0, sizeof(out));
+}
+
+void uuid_unparse(uuid_t uu, char *out)
+{
+    snprintf(out, 37, UUID_FMT,
+            uu[0], uu[1], uu[2], uu[3], uu[4], uu[5], uu[6], uu[7],
+            uu[8], uu[9], uu[10], uu[11], uu[12], uu[13], uu[14], uu[15]);
+}
+#endif
+
+#if defined(CONFIG_AIO)
+typedef struct {
+    BlockDriverAIOCB common;
+    int64_t sector_num;
+    QEMUIOVector *qiov;
+    uint8_t *buf;
+    /* Total number of sectors. */
+    int nb_sectors;
+    /* Number of sectors for current AIO. */
+    int n_sectors;
+    /* New allocated block map entry. */
+    uint32_t bmap_first;
+    uint32_t bmap_last;
+    /* Buffer for new allocated block. */
+    void *block_buffer;
+    void *orig_buf;
+    int header_modified;
+    BlockDriverAIOCB *hd_aiocb;
+    struct iovec hd_iov;
+    QEMUIOVector hd_qiov;
+    QEMUBH *bh;
+} VdiAIOCB;
+#endif
+
+typedef struct {
+    char text[0x40];
+    uint32_t signature;
+    uint32_t version;
+    uint32_t header_size;
+    uint32_t image_type;
+    uint32_t image_flags;
+    char description[256];
+    uint32_t offset_bmap;
+    uint32_t offset_data;
+    uint32_t cylinders;         /* disk geometry, unused here */
+    uint32_t heads;             /* disk geometry, unused here */
+    uint32_t sectors;           /* disk geometry, unused here */
+    uint32_t sector_size;
+    uint32_t unused1;
+    uint64_t disk_size;
+    uint32_t block_size;
+    uint32_t block_extra;       /* unused here */
+    uint32_t blocks_in_image;
+    uint32_t blocks_allocated;
+    uuid_t uuid_image;
+    uuid_t uuid_last_snap;
+    uuid_t uuid_link;
+    uuid_t uuid_parent;
+    uint64_t unused2[7];
+} VdiHeader;
+
+typedef struct {
+    BlockDriverState *hd;
+    /* The block map entries are little endian (even in memory). */
+    uint32_t *bmap;
+    /* Size of block (bytes). */
+    uint32_t block_size;
+    /* Size of block (sectors). */
+    uint32_t block_sectors;
+    /* First sector of block map. */
+    uint32_t bmap_sector;
+    /* VDI header (converted to host endianess). */
+    VdiHeader header;
+} BDRVVdiState;
+
+static void vdi_header_to_cpu(VdiHeader *header)
+{
+    le32_to_cpus(&header->signature);
+    le32_to_cpus(&header->version);
+    le32_to_cpus(&header->header_size);
+    le32_to_cpus(&header->image_type);
+    le32_to_cpus(&header->image_flags);
+    le32_to_cpus(&header->offset_bmap);
+    le32_to_cpus(&header->offset_data);
+    le32_to_cpus(&header->cylinders);
+    le32_to_cpus(&header->heads);
+    le32_to_cpus(&header->sectors);
+    le32_to_cpus(&header->sector_size);
+    le64_to_cpus(&header->disk_size);
+    le32_to_cpus(&header->block_size);
+    le32_to_cpus(&header->block_extra);
+    le32_to_cpus(&header->blocks_in_image);
+    le32_to_cpus(&header->blocks_allocated);
+}
+
+static void vdi_header_to_le(VdiHeader *header)
+{
+    cpu_to_le32s(&header->signature);
+    cpu_to_le32s(&header->version);
+    cpu_to_le32s(&header->header_size);
+    cpu_to_le32s(&header->image_type);
+    cpu_to_le32s(&header->image_flags);
+    cpu_to_le32s(&header->offset_bmap);
+    cpu_to_le32s(&header->offset_data);
+    cpu_to_le32s(&header->cylinders);
+    cpu_to_le32s(&header->heads);
+    cpu_to_le32s(&header->sectors);
+    cpu_to_le32s(&header->sector_size);
+    cpu_to_le64s(&header->disk_size);
+    cpu_to_le32s(&header->block_size);
+    cpu_to_le32s(&header->block_extra);
+    cpu_to_le32s(&header->blocks_in_image);
+    cpu_to_le32s(&header->blocks_allocated);
+}
+
+#if defined(CONFIG_VDI_DEBUG)
+static void vdi_header_print(VdiHeader *header)
+{
+    char uuid[37];
+    logout("text        %s", header->text);
+    logout("signature   0x%04x\n", header->signature);
+    logout("header size 0x%04x\n", header->header_size);
+    logout("image type  0x%04x\n", header->image_type);
+    logout("image flags 0x%04x\n", header->image_flags);
+    logout("description %s\n", header->description);
+    logout("offset bmap 0x%04x\n", header->offset_bmap);
+    logout("offset data 0x%04x\n", header->offset_data);
+    logout("cylinders   0x%04x\n", header->cylinders);
+    logout("heads       0x%04x\n", header->heads);
+    logout("sectors     0x%04x\n", header->sectors);
+    logout("sector size 0x%04x\n", header->sector_size);
+    logout("image size  0x%" PRIx64 " B (%" PRIu64 " MiB)\n",
+           header->disk_size, header->disk_size / MiB);
+    logout("block size  0x%04x\n", header->block_size);
+    logout("block extra 0x%04x\n", header->block_extra);
+    logout("blocks tot. 0x%04x\n", header->blocks_in_image);
+    logout("blocks all. 0x%04x\n", header->blocks_allocated);
+    uuid_unparse(header->uuid_image, uuid);
+    logout("uuid image  %s\n", uuid);
+    uuid_unparse(header->uuid_last_snap, uuid);
+    logout("uuid snap   %s\n", uuid);
+    uuid_unparse(header->uuid_link, uuid);
+    logout("uuid link   %s\n", uuid);
+    uuid_unparse(header->uuid_parent, uuid);
+    logout("uuid parent %s\n", uuid);
+}
+#endif
+
+static int vdi_check(BlockDriverState *bs)
+{
+    /* TODO: additional checks possible. */
+    BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
+    int n_errors = 0;
+    uint32_t blocks_allocated = 0;
+    uint32_t block;
+    uint32_t *bmap;
+    logout("\n");
+
+    bmap = qemu_malloc(s->header.blocks_in_image * sizeof(uint32_t));
+    memset(bmap, 0xff, s->header.blocks_in_image * sizeof(uint32_t));
+
+    /* Check block map and value of blocks_allocated. */
+    for (block = 0; block < s->header.blocks_in_image; block++) {
+        uint32_t bmap_entry = le32_to_cpu(s->bmap[block]);
+        if (bmap_entry != VDI_UNALLOCATED) {
+            if (bmap_entry < s->header.blocks_in_image) {
+                blocks_allocated++;
+                if (bmap[bmap_entry] == VDI_UNALLOCATED) {
+                    bmap[bmap_entry] = bmap_entry;
+                } else {
+                    fprintf(stderr, "ERROR: block index %" PRIu32
+                            " also used by %" PRIu32 "\n", bmap[bmap_entry], bmap_entry);
+                }
+            } else {
+                fprintf(stderr, "ERROR: block index %" PRIu32
+                        " too large, is %" PRIu32 "\n", block, bmap_entry);
+                n_errors++;
+            }
+        }
+    }
+    if (blocks_allocated != s->header.blocks_allocated) {
+        fprintf(stderr, "ERROR: allocated blocks mismatch, is %" PRIu32
+               ", should be %" PRIu32 "\n",
+               blocks_allocated, s->header.blocks_allocated);
+        n_errors++;
+    }
+
+    qemu_free(bmap);
+
+    return n_errors;
+}
+
+static int vdi_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
+{
+    /* TODO: vdi_get_info would be needed for machine snapshots.
+       vm_state_offset is still missing. */
+    BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
+    logout("\n");
+    bdi->cluster_size = s->block_size;
+    bdi->vm_state_offset = 0;
+    return 0;
+}
+
+static int vdi_make_empty(BlockDriverState *bs)
+{
+    /* TODO: missing code. */
+    logout("\n");
+    return 0;
+}
+
+static int vdi_probe(const uint8_t *buf, int buf_size, const char *filename)
+{
+    const VdiHeader *header = (const VdiHeader *)buf;
+    int result = 0;
+
+    logout("\n");
+
+    if (buf_size < sizeof(*header)) {
+        /* Header too small, no VDI. */
+    } else if (le32_to_cpu(header->signature) == VDI_SIGNATURE) {
+        result = 100;
+    }
+
+    if (result == 0) {
+        logout("no vdi image\n");
+    } else {
+        logout("%s", header->text);
+    }
+
+    return result;
+}
+
+#if defined(CONFIG_VDI_SNAPSHOT)
+static int vdi_snapshot_create(const char *filename, const char *backing_file)
+{
+    /* TODO: missing code. */
+    logout("\n");
+    return -1;
+}
+#endif
+
+static int vdi_open(BlockDriverState *bs, const char *filename, int flags)
+{
+    BDRVVdiState *s = bs->opaque;
+    VdiHeader header;
+    size_t bmap_size;
+    int ret;
+
+    logout("\n");
+
+    ret = bdrv_file_open(&s->hd, filename, flags);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (bdrv_read(s->hd, 0, (uint8_t *)&header, 1) < 0) {
+        goto fail;
+    }
+
+    vdi_header_to_cpu(&header);
+#if defined(CONFIG_VDI_DEBUG)
+    vdi_header_print(&header);
+#endif
+
+    if (header.version != VDI_VERSION_1_1) {
+        logout("unsupported version %u.%u\n",
+               header.version >> 16, header.version & 0xffff);
+        goto fail;
+    } else if (header.offset_bmap % SECTOR_SIZE != 0) {
+        /* We only support block maps which start on a sector boundary. */
+        logout("unsupported block map offset 0x%x B\n", header.offset_bmap);
+        goto fail;
+    } else if (header.offset_data % SECTOR_SIZE != 0) {
+        /* We only support data blocks which start on a sector boundary. */
+        logout("unsupported data offset 0x%x B\n", header.offset_data);
+        goto fail;
+    } else if (header.sector_size != SECTOR_SIZE) {
+        logout("unsupported sector size %u B\n", header.sector_size);
+        goto fail;
+    } else if (header.block_size != 1 * MiB) {
+        logout("unsupported block size %u B\n", header.block_size);
+        goto fail;
+    } else if (header.disk_size !=
+               (uint64_t)header.blocks_in_image * header.block_size) {
+        logout("unexpected block number %u B\n", header.blocks_in_image);
+        goto fail;
+    }
+
+    bs->total_sectors = header.disk_size / SECTOR_SIZE;
+
+    s->block_size = header.block_size;
+    s->block_sectors = header.block_size / SECTOR_SIZE;
+    s->bmap_sector = header.offset_bmap / SECTOR_SIZE;
+    s->header = header;
+
+    bmap_size = header.blocks_in_image * sizeof(uint32_t);
+    s->bmap = qemu_malloc(bmap_size);
+    if (bdrv_read(s->hd, s->bmap_sector,
+                  (uint8_t *)s->bmap, bmap_size / SECTOR_SIZE) < 0) {
+        goto fail_free_bmap;
+    }
+
+    return 0;
+
+ fail_free_bmap:
+    qemu_free(s->bmap);
+
+ fail:
+    bdrv_delete(s->hd);
+    return -1;
+}
+
+static int vdi_is_allocated(BlockDriverState *bs, int64_t sector_num,
+                             int nb_sectors, int *pnum)
+{
+    /* TODO: Check for too large sector_num (in bdrv_is_allocated or here). */
+    BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
+    size_t bmap_index = sector_num / s->block_sectors;
+    size_t sector_in_block = sector_num % s->block_sectors;
+    int n_sectors = s->block_sectors - sector_in_block;
+    uint32_t bmap_entry = le32_to_cpu(s->bmap[bmap_index]);
+    logout("%p, %" PRId64 ", %d, %p\n", bs, sector_num, nb_sectors, pnum);
+    if (n_sectors > nb_sectors) {
+        n_sectors = nb_sectors;
+    }
+    *pnum = n_sectors;
+    return bmap_entry != VDI_UNALLOCATED;
+}
+
+#if defined(CONFIG_AIO)
+
+#if 0
+static void vdi_aio_remove(VdiAIOCB *acb)
+{
+    logout("\n");
+#if 0
+    VdiAIOCB **pacb;
+
+    /* remove the callback from the queue */
+    pacb = &posix_aio_state->first_aio;
+    for(;;) {
+        if (*pacb == NULL) {
+            fprintf(stderr, "vdi_aio_remove: aio request not found!\n");
+            break;
+        } else if (*pacb == acb) {
+            *pacb = acb->next;
+            qemu_aio_release(acb);
+            break;
+        }
+        pacb = &(*pacb)->next;
+    }
+#endif
+}
+#endif
+
+static void vdi_aio_cancel(BlockDriverAIOCB *blockacb)
+{
+    logout("\n");
+
+#if 0
+    int ret;
+    VdiAIOCB *acb = (VdiAIOCB *)blockacb;
+
+    ret = qemu_paio_cancel(acb->aiocb.aio_fildes, &acb->aiocb);
+    if (ret == QEMU_PAIO_NOTCANCELED) {
+        /* fail safe: if the aio could not be canceled, we wait for
+           it */
+        while (qemu_paio_error(&acb->aiocb) == EINPROGRESS);
+    }
+
+    vdi_aio_remove(acb);
+#endif
+}
+
+static AIOPool vdi_aio_pool = {
+    .aiocb_size = sizeof(VdiAIOCB),
+    .cancel = vdi_aio_cancel,
+};
+
+static VdiAIOCB *vdi_aio_setup(BlockDriverState *bs, int64_t sector_num,
+        QEMUIOVector *qiov, int nb_sectors,
+        BlockDriverCompletionFunc *cb, void *opaque, int is_write)
+{
+    VdiAIOCB *acb;
+
+    logout("%p, %" PRId64 ", %p, %d, %p, %p, %d\n",
+           bs, sector_num, qiov, nb_sectors, cb, opaque, is_write);
+
+    acb = qemu_aio_get(&vdi_aio_pool, bs, cb, opaque);
+    if (acb) {
+        acb->hd_aiocb = NULL;
+        acb->sector_num = sector_num;
+        acb->qiov = qiov;
+        if (qiov->niov > 1) {
+            acb->buf = qemu_blockalign(bs, qiov->size);
+            acb->orig_buf = acb->buf;
+            if (is_write) {
+                qemu_iovec_to_buffer(qiov, acb->buf);
+            }
+        } else {
+            acb->buf = (uint8_t *)qiov->iov->iov_base;
+        }
+        acb->nb_sectors = nb_sectors;
+        acb->n_sectors = 0;
+        acb->bmap_first = VDI_UNALLOCATED;
+        acb->bmap_last = VDI_UNALLOCATED;
+        acb->block_buffer = NULL;
+        acb->header_modified = 0;
+    }
+    return acb;
+}
+
+static int vdi_schedule_bh(QEMUBHFunc *cb, VdiAIOCB *acb)
+{
+    logout("\n");
+
+    if (acb->bh) {
+        return -EIO;
+    }
+
+    acb->bh = qemu_bh_new(cb, acb);
+    if (!acb->bh) {
+        return -EIO;
+    }
+
+    qemu_bh_schedule(acb->bh);
+
+    return 0;
+}
+
+static void vdi_aio_read_cb(void *opaque, int ret);
+
+static void vdi_aio_read_bh(void *opaque)
+{
+    VdiAIOCB *acb = opaque;
+    logout("\n");
+    qemu_bh_delete(acb->bh);
+    acb->bh = NULL;
+    vdi_aio_read_cb(opaque, 0);
+}
+
+static void vdi_aio_read_cb(void *opaque, int ret)
+{
+    VdiAIOCB *acb = opaque;
+    BlockDriverState *bs = acb->common.bs;
+    BDRVVdiState *s = bs->opaque;
+    uint32_t bmap_entry;
+    uint32_t block_index;
+    uint32_t sector_in_block;
+    uint32_t n_sectors;
+
+    logout("%u sectors read\n", acb->n_sectors);
+
+    acb->hd_aiocb = NULL;
+
+    if (ret < 0) {
+        goto done;
+    }
+
+    acb->nb_sectors -= acb->n_sectors;
+
+    if (acb->nb_sectors == 0) {
+        /* request completed */
+        ret = 0;
+        goto done;
+    }
+
+    acb->sector_num += acb->n_sectors;
+    acb->buf += acb->n_sectors * SECTOR_SIZE;
+
+    block_index = acb->sector_num / s->block_sectors;
+    sector_in_block = acb->sector_num % s->block_sectors;
+    n_sectors = s->block_sectors - sector_in_block;
+    if (n_sectors > acb->nb_sectors) {
+        n_sectors = acb->nb_sectors;
+    }
+
+    logout("will read %u sectors starting at sector %" PRIu64 "\n",
+           n_sectors, acb->sector_num);
+
+    /* prepare next AIO request */
+    acb->n_sectors = n_sectors;
+    bmap_entry = le32_to_cpu(s->bmap[block_index]);
+    if (bmap_entry == VDI_UNALLOCATED) {
+        /* Block not allocated, return zeros, no need to wait. */
+        memset(acb->buf, 0, n_sectors * SECTOR_SIZE);
+        ret = vdi_schedule_bh(vdi_aio_read_bh, acb);
+        if (ret < 0) {
+            goto done;
+        }
+    } else {
+        uint64_t offset = s->header.offset_data / SECTOR_SIZE +
+                          (uint64_t)bmap_entry * s->block_sectors +
+                          sector_in_block;
+        acb->hd_iov.iov_base = (void *)acb->buf;
+        acb->hd_iov.iov_len = n_sectors * SECTOR_SIZE;
+        qemu_iovec_init_external(&acb->hd_qiov, &acb->hd_iov, 1);
+        acb->hd_aiocb = bdrv_aio_readv(s->hd, offset, &acb->hd_qiov,
+                                       n_sectors, vdi_aio_read_cb, acb);
+        if (acb->hd_aiocb == NULL) {
+            goto done;
+        }
+    }
+    return;
+done:
+    if (acb->qiov->niov > 1) {
+        qemu_iovec_from_buffer(acb->qiov, acb->orig_buf, acb->qiov->size);
+        qemu_vfree(acb->orig_buf);
+    }
+    acb->common.cb(acb->common.opaque, ret);
+    qemu_aio_release(acb);
+}
+
+static BlockDriverAIOCB *vdi_aio_readv(BlockDriverState *bs,
+        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
+        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    VdiAIOCB *acb;
+    logout("\n");
+    acb = vdi_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, 0);
+    if (!acb) {
+        return NULL;
+    }
+    vdi_aio_read_cb(acb, 0);
+    return &acb->common;
+}
+
+static void vdi_aio_write_cb(void *opaque, int ret)
+{
+    VdiAIOCB *acb = opaque;
+    BlockDriverState *bs = acb->common.bs;
+    BDRVVdiState *s = bs->opaque;
+    uint32_t bmap_entry;
+    uint32_t block_index;
+    uint32_t sector_in_block;
+    uint32_t n_sectors;
+
+    acb->hd_aiocb = NULL;
+
+    if (ret < 0) {
+        goto done;
+    }
+
+    acb->nb_sectors -= acb->n_sectors;
+    acb->sector_num += acb->n_sectors;
+    acb->buf += acb->n_sectors * SECTOR_SIZE;
+
+    if (acb->nb_sectors == 0) {
+        logout("finished data write\n");
+        acb->n_sectors = 0;
+        if (acb->header_modified) {
+            VdiHeader *header = acb->block_buffer;
+            logout("now writing modified header\n");
+            assert(acb->bmap_first != VDI_UNALLOCATED);
+            *header = s->header;
+            vdi_header_to_le(header);
+            acb->header_modified = 0;
+            acb->hd_iov.iov_base = acb->block_buffer;
+            acb->hd_iov.iov_len = SECTOR_SIZE;
+            qemu_iovec_init_external(&acb->hd_qiov, &acb->hd_iov, 1);
+            acb->hd_aiocb = bdrv_aio_writev(s->hd, 0, &acb->hd_qiov, 1,
+                                            vdi_aio_write_cb, acb);
+            if (acb->hd_aiocb == NULL) {
+                goto done;
+            }
+            return;
+        } else if (acb->bmap_first != VDI_UNALLOCATED) {
+            /* One or more new blocks were allocated. */
+            uint64_t offset;
+            uint32_t bmap_first;
+            uint32_t bmap_last;
+            qemu_free(acb->block_buffer);
+            acb->block_buffer = NULL;
+            bmap_first = acb->bmap_first;
+            bmap_last = acb->bmap_last;
+            logout("now writing modified block map entry %u...%u\n",
+                   bmap_first, bmap_last);
+            /* Write modified sectors from block map. */
+            bmap_first /= (SECTOR_SIZE / sizeof(uint32_t));
+            bmap_last /= (SECTOR_SIZE / sizeof(uint32_t));
+            n_sectors = bmap_last - bmap_first + 1;
+            offset = s->bmap_sector + bmap_first;
+            acb->bmap_first = VDI_UNALLOCATED;
+            acb->hd_iov.iov_base = (uint8_t *)&s->bmap[0] +
+                                   bmap_first * SECTOR_SIZE;
+            acb->hd_iov.iov_len = n_sectors * SECTOR_SIZE;
+            qemu_iovec_init_external(&acb->hd_qiov, &acb->hd_iov, 1);
+            logout("will write %u block map sectors starting from entry %u\n",
+                   n_sectors, bmap_first);
+            acb->hd_aiocb = bdrv_aio_writev(s->hd, offset, &acb->hd_qiov,
+                                            n_sectors, vdi_aio_write_cb, acb);
+            if (acb->hd_aiocb == NULL) {
+                goto done;
+            }
+            return;
+        }
+        ret = 0;
+        goto done;
+    }
+
+    logout("%u sectors written\n", acb->n_sectors);
+
+    block_index = acb->sector_num / s->block_sectors;
+    sector_in_block = acb->sector_num % s->block_sectors;
+    n_sectors = s->block_sectors - sector_in_block;
+    if (n_sectors > acb->nb_sectors) {
+        n_sectors = acb->nb_sectors;
+    }
+
+    logout("will write %u sectors starting at sector %" PRIu64 "\n",
+           n_sectors, acb->sector_num);
+
+    /* prepare next AIO request */
+    acb->n_sectors = n_sectors;
+    bmap_entry = le32_to_cpu(s->bmap[block_index]);
+    if (bmap_entry == VDI_UNALLOCATED) {
+        /* Allocate new block and write to it. */
+        uint64_t offset;
+        uint8_t *block;
+        bmap_entry = s->header.blocks_allocated;
+        s->bmap[block_index] = cpu_to_le32(bmap_entry);
+        s->header.blocks_allocated++;
+        offset = s->header.offset_data / SECTOR_SIZE +
+                 (uint64_t)bmap_entry * s->block_sectors;
+        block = acb->block_buffer;
+        if (block == NULL) {
+            block = qemu_mallocz(s->block_size);
+            acb->block_buffer = block;
+            acb->bmap_first = block_index;
+            assert(!acb->header_modified);
+            acb->header_modified = 1;
+        }
+        acb->bmap_last = block_index;
+        memcpy(block + sector_in_block * SECTOR_SIZE,
+               acb->buf, n_sectors * SECTOR_SIZE);
+        acb->hd_iov.iov_base = block;
+        acb->hd_iov.iov_len = s->block_size;
+        qemu_iovec_init_external(&acb->hd_qiov, &acb->hd_iov, 1);
+        acb->hd_aiocb = bdrv_aio_writev(s->hd, offset,
+                                        &acb->hd_qiov, s->block_sectors,
+                                        vdi_aio_write_cb, acb);
+        if (acb->hd_aiocb == NULL) {
+            goto done;
+        }
+    } else {
+        uint64_t offset = s->header.offset_data / SECTOR_SIZE +
+                          (uint64_t)bmap_entry * s->block_sectors +
+                          sector_in_block;
+        acb->hd_iov.iov_base = acb->buf;
+        acb->hd_iov.iov_len = n_sectors * SECTOR_SIZE;
+        qemu_iovec_init_external(&acb->hd_qiov, &acb->hd_iov, 1);
+        acb->hd_aiocb = bdrv_aio_writev(s->hd, offset, &acb->hd_qiov,
+                                        n_sectors, vdi_aio_write_cb, acb);
+        if (acb->hd_aiocb == NULL) {
+            goto done;
+        }
+    }
+
+    return;
+
+done:
+    if (acb->qiov->niov > 1) {
+        qemu_vfree(acb->orig_buf);
+    }
+    acb->common.cb(acb->common.opaque, ret);
+    qemu_aio_release(acb);
+}
+
+static BlockDriverAIOCB *vdi_aio_writev(BlockDriverState *bs,
+        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
+        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    VdiAIOCB *acb;
+    logout("\n");
+    acb = vdi_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, 1);
+    if (!acb) {
+        return NULL;
+    }
+    vdi_aio_write_cb(acb, 0);
+    return &acb->common;
+}
+
+#else /* CONFIG_AIO */
+
+static int vdi_read(BlockDriverState *bs, int64_t sector_num,
+                    uint8_t *buf, int nb_sectors)
+{
+    BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
+    logout("%p, %" PRId64 ", %p, %d\n", bs, sector_num, buf, nb_sectors);
+    if (sector_num < 0) {
+        logout("unsupported sector %" PRId64 "\n", sector_num);
+        return -1;
+    }
+    while (nb_sectors > 0 && sector_num < bs->total_sectors) {
+        uint32_t bmap_entry;
+        size_t block_index = sector_num / s->block_sectors;
+        size_t sector_in_block = sector_num % s->block_sectors;
+        size_t n_sectors = s->block_sectors - sector_in_block;
+        if (n_sectors > nb_sectors) {
+            n_sectors = nb_sectors;
+        }
+        bmap_entry = le32_to_cpu(s->bmap[block_index]);
+        if (bmap_entry == VDI_UNALLOCATED) {
+            /* Block not allocated, return zeros. */
+            memset(buf, 0, n_sectors * SECTOR_SIZE);
+        } else {
+            uint64_t offset = s->header.offset_data / SECTOR_SIZE +
+                (uint64_t)bmap_entry * s->block_sectors + sector_in_block;
+            if (bdrv_read(s->hd, offset, buf, n_sectors) < 0) {
+                logout("read error\n");
+                return -1;
+            }
+        }
+        buf += n_sectors * SECTOR_SIZE;
+        sector_num += n_sectors;
+        nb_sectors -= n_sectors;
+    }
+    return 0;
+}
+
+#if defined(CONFIG_VDI_WRITE)
+static int vdi_write(BlockDriverState *bs, int64_t sector_num,
+                     const uint8_t *buf, int nb_sectors)
+{
+    BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
+    logout("%p, %" PRId64 ", %p, %d\n", bs, sector_num, buf, nb_sectors);
+    if (sector_num < 0) {
+        logout("unsupported sector %" PRId64 "\n", sector_num);
+        return -1;
+    }
+    while (nb_sectors > 0 && sector_num < bs->total_sectors) {
+        uint32_t bmap_entry;
+        uint64_t offset;
+        size_t block_index = sector_num / s->block_sectors;
+        size_t sector_in_block = sector_num % s->block_sectors;
+        size_t n_sectors = s->block_sectors - sector_in_block;
+        if (n_sectors > nb_sectors) {
+            n_sectors = nb_sectors;
+        }
+        bmap_entry = le32_to_cpu(s->bmap[block_index]);
+        if (bmap_entry == VDI_UNALLOCATED) {
+            /* Allocate new block and write to it. */
+            VdiHeader header;
+            uint8_t *block;
+            bmap_entry = s->header.blocks_allocated;
+            s->bmap[block_index] = cpu_to_le32(bmap_entry);
+            s->header.blocks_allocated++;
+            offset = s->header.offset_data / SECTOR_SIZE +
+                     (uint64_t)bmap_entry * s->block_sectors;
+            block = qemu_mallocz(s->block_size);
+            memcpy(block + sector_in_block * SECTOR_SIZE,
+                   buf, n_sectors * SECTOR_SIZE);
+            if (bdrv_write(s->hd, offset, block, s->block_sectors) < 0) {
+                qemu_free(block);
+                logout("write error\n");
+                return -1;
+            }
+            qemu_free(block);
+
+            /* Write modified sector from block map. */
+            block_index /= (SECTOR_SIZE / sizeof(uint32_t));
+            offset = s->bmap_sector + block_index;
+            if (bdrv_write(s->hd, offset,
+                           (uint8_t *)&s->bmap[bmap_entry], 1) < 0) {
+                logout("write error\n");
+                return -1;
+            }
+
+            /* Write modified header (blocks_allocated). */
+            header = s->header;
+            vdi_header_to_le(&header);
+            if (bdrv_write(s->hd, 0, (uint8_t *)&header, 1) < 0) {
+                logout("write error\n");
+                return -1;
+            }
+        } else {
+            /* Write to existing block. */
+            offset = s->header.offset_data / SECTOR_SIZE +
+                (uint64_t)bmap_entry * s->block_sectors +
+                sector_in_block;
+            if (bdrv_write(s->hd, offset, buf, n_sectors) < 0) {
+                logout("write error\n");
+                return -1;
+            }
+        }
+        buf += n_sectors * SECTOR_SIZE;
+        sector_num += n_sectors;
+        nb_sectors -= n_sectors;
+    }
+    return 0;
+}
+#endif /* CONFIG_VDI_WRITE */
+
+#endif /* CONFIG_AIO */
+
+static int vdi_create(const char *filename, QEMUOptionParameter *options)
+{
+    /* TODO: Support pre-allocated images. */
+    int fd;
+    int result = 0;
+    uint64_t bytes = 0;
+    uint32_t blocks;
+    size_t block_size = 1 * MiB;
+    uint32_t image_type = VDI_TYPE_DYNAMIC;
+    VdiHeader header;
+    size_t i;
+    size_t bmap_size;
+    uint32_t *bmap;
+
+    logout("\n");
+
+    /* Read out options. */
+    while (options && options->name) {
+        if (!strcmp(options->name, BLOCK_OPT_SIZE)) {
+            bytes = options->value.n;
+#if defined(CONFIG_VDI_BLOCK_SIZE)
+        } else if (!strcmp(options->name, BLOCK_OPT_CLUSTER_SIZE)) {
+            if (options->value.n) {
+                /* TODO: Additional checks (SECTOR_SIZE * 2^n, ...). */
+                block_size = options->value.n;
+            }
+#endif
+#if defined(CONFIG_VDI_STATIC_IMAGE)
+        } else if (!strcmp(options->name, BLOCK_OPT_STATIC)) {
+            image_type = VDI_TYPE_STATIC;
+#endif
+        }
+        options++;
+    }
+
+    fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE,
+              0644);
+    if (fd < 0) {
+        return -errno;
+    }
+
+    blocks = bytes / block_size;
+    bmap_size = blocks * sizeof(uint32_t);
+    bmap_size = ((bmap_size + SECTOR_SIZE - 1) & ~(SECTOR_SIZE -1));
+
+    memset(&header, 0, sizeof(header));
+    strcpy(header.text, VDI_TEXT);
+    header.signature = VDI_SIGNATURE;
+    header.version = VDI_VERSION_1_1;
+    header.header_size = 0x180;
+    header.image_type = image_type;
+    header.offset_bmap = 0x200;
+    header.offset_data = 0x200 + bmap_size;
+    header.sector_size = SECTOR_SIZE;
+    header.disk_size = bytes;
+    header.block_size = block_size;
+    header.blocks_in_image = blocks;
+    uuid_generate(header.uuid_image);
+    uuid_generate(header.uuid_last_snap);
+#if 0
+    uuid_generate(header.uuid_link);
+    uuid_generate(header.uuid_parent);
+#endif
+#if defined(CONFIG_VDI_DEBUG)
+    vdi_header_print(&header);
+#endif
+    vdi_header_to_le(&header);
+    if (write(fd, &header, sizeof(header)) < 0) {
+        result = -errno;
+    }
+
+    bmap = (uint32_t *)qemu_mallocz(bmap_size);
+    for (i = 0; i < blocks; i++) {
+        bmap[i] = VDI_UNALLOCATED;
+    }
+    if (write(fd, bmap, bmap_size) < 0) {
+        result = -errno;
+    }
+    qemu_free(bmap);
+
+    if (close(fd) < 0) {
+        result = -errno;
+    }
+
+    return result;
+}
+
+static void vdi_close(BlockDriverState *bs)
+{
+    BDRVVdiState *s = bs->opaque;
+    logout("\n");
+    bdrv_delete(s->hd);
+}
+
+static void vdi_flush(BlockDriverState *bs)
+{
+    BDRVVdiState *s = bs->opaque;
+    logout("\n");
+    bdrv_flush(s->hd);
+}
+
+
+static QEMUOptionParameter vdi_create_options[] = {
+    {
+        .name = BLOCK_OPT_SIZE,
+        .type = OPT_SIZE,
+        .help = "Virtual disk size"
+    },
+#if defined(CONFIG_VDI_BLOCK_SIZE)
+    {
+        .name = BLOCK_OPT_CLUSTER_SIZE,
+        .type = OPT_SIZE,
+        .help = "VDI cluster (block) size"
+    },
+#endif
+#if defined(CONFIG_VDI_STATIC_IMAGE)
+    {
+        .name = BLOCK_OPT_STATIC,
+        .type = OPT_FLAG,
+        .help = "VDI static (pre-allocated) image"
+    },
+#endif
+    { NULL }
+};
+
+static BlockDriver bdrv_vdi = {
+    .format_name = "vdi",
+    .instance_size = sizeof(BDRVVdiState),
+    .bdrv_probe = vdi_probe,
+    .bdrv_open = vdi_open,
+    .bdrv_close = vdi_close,
+    .bdrv_create = vdi_create,
+    .bdrv_flush = vdi_flush,
+#if defined(CONFIG_VDI_UNSUPPORTED)
+    .bdrv_getlength = vdi_getlength,
+#endif
+    .bdrv_is_allocated = vdi_is_allocated,
+#if defined(CONFIG_VDI_UNSUPPORTED)
+    .bdrv_set_key = vdi_set_key,
+#endif
+    .bdrv_make_empty = vdi_make_empty,
+
+#ifdef CONFIG_AIO
+    .bdrv_aio_readv = vdi_aio_readv,
+#if defined(CONFIG_VDI_WRITE)
+    .bdrv_aio_writev = vdi_aio_writev,
+#endif
+#else
+    .bdrv_read = vdi_read,
+#if defined(CONFIG_VDI_WRITE)
+    .bdrv_write = vdi_write,
+#endif
+#endif
+
+#if defined(CONFIG_VDI_UNSUPPORTED)
+    .bdrv_write_compressed = vdi_write_compressed,
+#endif
+
+#if defined(CONFIG_VDI_SNAPSHOT)
+    .bdrv_snapshot_create = vdi_snapshot_create,
+    .bdrv_snapshot_goto = vdi_snapshot_goto,
+    .bdrv_snapshot_delete = vdi_snapshot_delete,
+    .bdrv_snapshot_list = vdi_snapshot_list,
+#endif
+    .bdrv_get_info = vdi_get_info,
+
+#if defined(CONFIG_VDI_UNSUPPORTED)
+    .bdrv_put_buffer = vdi_put_buffer,
+    .bdrv_get_buffer = vdi_get_buffer,
+#endif
+
+    .create_options = vdi_create_options,
+    .bdrv_check = vdi_check,
+};
+
+static void bdrv_vdi_init(void)
+{
+    logout("\n");
+    bdrv_register(&bdrv_vdi);
+}
+
+block_init(bdrv_vdi_init);
diff --git a/qemu-img.texi b/qemu-img.texi
index 49d4e59..69e24b5 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -48,6 +48,8 @@ Old QEMU image format. Left for compatibility.
 User Mode Linux Copy On Write image format. Used to be the only growable
 image format in QEMU. It is supported only for compatibility with
 previous versions. It does not work on win32.
+@item vdi
+VirtualBox 1.1 compatible image format.
 @item vmdk
 VMware 3 and 4 compatible image format.
 @item cloop
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH] add support for new option of vdi format
  2009-07-23 15:58 ` [Qemu-devel] [PATCH] RFC: Add new block driver for the VDI format (aio version) Stefan Weil
  2009-07-23 20:27   ` [Qemu-devel] [PATCH] Check availability of uuid header / lib Stefan Weil
  2009-07-23 20:29   ` [Qemu-devel] [PATCH] Add new block driver for the VDI format (use aio) Stefan Weil
@ 2009-07-23 20:30   ` Stefan Weil
  2009-07-23 20:34     ` [Qemu-devel] " Stefan Weil
                       ` (2 more replies)
  2 siblings, 3 replies; 44+ messages in thread
From: Stefan Weil @ 2009-07-23 20:30 UTC (permalink / raw)
  Cc: Christoph Hellwig, QEMU Developers

VDI supports an image option 'static'.
Ignore "static=off" from qemu-img output.

Cc: Christoph Hellwig <hch@lst.de>
Cc: QEMU Developers <qemu-devel@nongnu.org>
Signed-off-by: Stefan Weil <weil@mail.berlios.de>
---
 common.rc |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/common.rc b/common.rc
index 16cf77d..4ba5c71 100644
--- a/common.rc
+++ b/common.rc
@@ -61,8 +61,8 @@ _make_test_img()
     	sed -e "s#$IMGFMT#IMGFMT#g" | \
 	sed -e "s# encryption=off##g" | \
 	sed -e "s# cluster_size=0##g" | \
-	sed -e "s# compat6=off##g"
-
+        sed -e "s# compat6=off##g" | \
+        sed -e "s# static=off##g"
 }
 
 _cleanup_test_img()
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [Qemu-devel] Re: [PATCH] add support for new option of vdi format
  2009-07-23 20:30   ` [Qemu-devel] [PATCH] add support for new option of vdi format Stefan Weil
@ 2009-07-23 20:34     ` Stefan Weil
  2009-07-31 14:59     ` [Qemu-devel] " Christoph Hellwig
  2009-08-13 16:53     ` Christoph Hellwig
  2 siblings, 0 replies; 44+ messages in thread
From: Stefan Weil @ 2009-07-23 20:34 UTC (permalink / raw)
  Cc: Christoph Hellwig, QEMU Developers

Stefan Weil schrieb:
> VDI supports an image option 'static'.
> Ignore "static=off" from qemu-img output.
>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: QEMU Developers <qemu-devel@nongnu.org>
> Signed-off-by: Stefan Weil <weil@mail.berlios.de>
> ---
>  common.rc |    4 ++--
>  1 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/common.rc b/common.rc
> index 16cf77d..4ba5c71 100644
> --- a/common.rc
> +++ b/common.rc
> @@ -61,8 +61,8 @@ _make_test_img()
>      	sed -e "s#$IMGFMT#IMGFMT#g" | \
>  	sed -e "s# encryption=off##g" | \
>  	sed -e "s# cluster_size=0##g" | \
> -	sed -e "s# compat6=off##g"
> -
> +        sed -e "s# compat6=off##g" | \
> +        sed -e "s# static=off##g"
>  }
>  
>  _cleanup_test_img()
>   


The patch is for qemu-io-tests, not for QEMU.
I should have added this information to the subject line.

Regards, Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] Check availability of uuid header / lib
  2009-07-23 20:27   ` [Qemu-devel] [PATCH] Check availability of uuid header / lib Stefan Weil
@ 2009-07-24  6:32     ` Christoph Egger
  2009-10-01 18:13       ` Stefan Weil
  2009-10-01 18:10     ` [Qemu-devel] [PATCH] Check availability of uuid header / library Stefan Weil
  1 sibling, 1 reply; 44+ messages in thread
From: Christoph Egger @ 2009-07-24  6:32 UTC (permalink / raw)
  To: qemu-devel

On Thursday 23 July 2009 22:27:54 Stefan Weil wrote:
> The Universally Unique Identifier library will be used
> for the new vdi block driver and maybe other parts of QEMU.

This is very Linux specific.
On NetBSD, the header is in <sys/uuid.h> and part of libc.
The API implements DCE 1.1 RPC specification which is
very different from Linux uuid.

Christoph


>
> Signed-off-by: Stefan Weil <weil@mail.berlios.de>
> ---
>  Makefile        |    1 +
>  Makefile.target |    2 ++
>  configure       |   21 +++++++++++++++++++++
>  3 files changed, 24 insertions(+), 0 deletions(-)
>
> diff --git a/Makefile b/Makefile
> index dc95869..d8fa730 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -29,6 +29,7 @@ else
>  DOCS=
>  endif
>
> +LIBS+=$(UUID_LIBS)
>  LIBS+=$(PTHREADLIBS)
>  LIBS+=$(CLOCKLIBS)
>
> diff --git a/Makefile.target b/Makefile.target
> index f9cd42a..4a01e96 100644
> --- a/Makefile.target
> +++ b/Makefile.target
> @@ -469,6 +469,8 @@ ifdef CONFIG_BLUEZ
>  LIBS += $(CONFIG_BLUEZ_LIBS)
>  endif
>
> +LIBS += $(UUID_LIBS)
> +
>  # xen backend driver support
>  obj-$(CONFIG_XEN) += xen_machine_pv.o xen_domainbuild.o
>  ifeq ($(CONFIG_XEN), y)
> diff --git a/configure b/configure
> index 39bed79..28a9c48 100755
> --- a/configure
> +++ b/configure
> @@ -995,6 +995,22 @@ if $cc $ARCH_CFLAGS -o $TMPE $TMPC > /dev/null 2>
> /dev/null ; then fi
>
>  ##########################################
> +# uuid_generate() probe, used for vdi block driver
> +uuid="no"
> +cat > $TMPC << EOF
> +#include <uuid/uuid.h>
> +int main(void)
> +{
> +    uuid_t my_uuid;
> +    uuid_generate(my_uuid);
> +    return 0;
> +}
> +EOF
> +if $cc $ARCH_CFLAGS -o $TMPE $TMPC -luuid >/dev/null 2>&1; then
> +   uuid="yes"
> +fi
> +
> +##########################################
>  # vde libraries probe
>  if test "$vde" = "yes" ; then
>    cat > $TMPC << EOF
> @@ -1473,6 +1489,7 @@ echo "Install blobs     $blobs"
>  echo -e "KVM support       $kvm"
>  echo "fdt support       $fdt"
>  echo "preadv support    $preadv"
> +echo "uuid support      $uuid"
>
>  if test $sdl_too_old = "yes"; then
>  echo "-> Your SDL version is too old - please upgrade to have SDL support"
> @@ -1655,6 +1672,10 @@ fi
>  if test "$fnmatch" = "yes" ; then
>    echo "#define HAVE_FNMATCH_H 1" >> $config_host_h
>  fi
> +if test "$uuid" = "yes" ; then
> +  echo "#define HAVE_UUID_H 1" >> $config_host_h
> +  echo "UUID_LIBS=-luuid" >> $config_host_mak
> +fi
>  qemu_version=`head $source_path/VERSION`
>  echo "VERSION=$qemu_version" >>$config_host_mak
>  echo "#define QEMU_VERSION \"$qemu_version\"" >> $config_host_h



-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Thomas M. McCoy, Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] Add new block driver for the VDI format (use aio)
  2009-07-23 20:29   ` [Qemu-devel] [PATCH] Add new block driver for the VDI format (use aio) Stefan Weil
@ 2009-07-24  9:18     ` Kevin Wolf
  2009-07-24 16:20       ` Stefan Weil
  2009-07-31 15:25     ` Anthony Liguori
  1 sibling, 1 reply; 44+ messages in thread
From: Kevin Wolf @ 2009-07-24  9:18 UTC (permalink / raw)
  To: Stefan Weil; +Cc: QEMU Developers

Stefan Weil schrieb:
> This is a new block driver written from scratch
> to support the VDI format in QEMU.
> 
> VDI is the native format used by Innotek / SUN VirtualBox.
> 
> Signed-off-by: Stefan Weil <weil@mail.berlios.de>
> ---
>  Makefile      |    2 +-
>  block/vdi.c   | 1105 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  qemu-img.texi |    2 +
>  3 files changed, 1108 insertions(+), 1 deletions(-)
>  create mode 100644 block/vdi.c
> 
> diff --git a/Makefile b/Makefile
> index d8fa730..29f4a65 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -66,7 +66,7 @@ recurse-all: $(SUBDIR_RULES)
>  block-obj-y = cutils.o cache-utils.o qemu-malloc.o qemu-option.o module.o
>  block-obj-y += nbd.o block.o aio.o aes.o
>  
> -block-nested-y += cow.o qcow.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
> +block-nested-y += cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
>  block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
>  block-nested-y += parallels.o nbd.o
>  
> diff --git a/block/vdi.c b/block/vdi.c
> new file mode 100644
> index 0000000..0432446
> --- /dev/null
> +++ b/block/vdi.c
> @@ -0,0 +1,1105 @@
> +/*
> + * Block driver for the Virtual Disk Image (VDI) format
> + *
> + * Copyright (c) 2009 Stefan Weil
> + *
> + * This program is free software: you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation, either version 2 of the License, or
> + * (at your option) version 3 or any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + *
> + * Reference:
> + * http://forums.virtualbox.org/viewtopic.php?t=8046
> + *
> + * This driver supports create / read / write operations on VDI images.
> + *
> + * Todo (see also TODO in code):
> + *
> + * Some features like snapshots are still missing.
> + *
> + * Deallocation of zero-filled blocks and shrinking images are missing, too
> + * (might be added to common block layer).
> + *
> + * Allocation of blocks could be optimized (less writes to block map and
> + * header).
> + *
> + * Read and write of adjacents blocks could be done in one operation
> + * (current code uses one operation per block (1 MiB).
> + *
> + * The code is not thread safe (missing locks for changes in header and
> + * block table, no problem with current QEMU).
> + *
> + * Hints:
> + *
> + * Blocks (VDI documentation) correspond to clusters (QEMU).
> + * QEMU's backing files could be implemented using VDI snapshot files (TODO).
> + * VDI snapshot files may also contain the complete machine state.
> + * Maybe this machine state can be converted to QEMU PC machine snapshot data.
> + *
> + * The driver keeps a block cache (little endian entries) in memory.
> + * For the standard block size (1 MiB), a terrabyte disk will use 4 MiB RAM,

Tera, not terra. ;-)

> + * so this seems to be reasonable.
> + */
> +
> +#include "qemu-common.h"
> +#include "block_int.h"
> +#include "module.h"
> +
> +#if defined(HAVE_UUID_H)
> +#include <uuid/uuid.h>
> +#else
> +/* TODO: move uuid emulation to some central place in QEMU. */
> +#include "sysemu.h"     /* UUID_FMT */
> +typedef unsigned char uuid_t[16];
> +void uuid_generate(uuid_t out);
> +void uuid_unparse(uuid_t uu, char *out);
> +#endif
> +
> +/* Code configuration options. */
> +
> +/* Use old (synchronous) I/O. */
> +//~ #undef CONFIG_AIO
> +
> +/* Enable debug messages. */
> +//~ #define CONFIG_VDI_DEBUG
> +
> +/* Support write operations on VDI images. */
> +#define CONFIG_VDI_WRITE
> +
> +/* Support snapshot images (not implemented yet). */
> +//~ #define CONFIG_VDI_SNAPSHOT
> +
> +/* Enable (currently) unsupported features (not implemented yet). */
> +//~ #define CONFIG_VDI_UNSUPPORTED
> +
> +/* Support non-standard block (cluster) size. */
> +//~ #define CONFIG_VDI_BLOCK_SIZE

Actually, this is only about support for image creation. Any reason why
we shouldn't support creating images with non-standard block sizes? The
code already supports opening such images unconditionally, so the only
effect of turning it off for image creation is that we can't test that
functionality in qemu-iotests.

[Oh, sorry, actually there is a check in open which I missed at first.
Any reason why we can't support it? But it's consistent at least.]

> +/* Support static (pre-allocated) images. */
> +#define CONFIG_VDI_STATIC_IMAGE
> +
> +/* Command line option for static images. */
> +#define BLOCK_OPT_STATIC "static"

What about calling it "preallocate" and moving it to block_int.h? I
think this could make sense for other drivers, too.

> +
> +#define KiB     1024
> +#define MiB     (KiB * KiB)
> +
> +#define SECTOR_SIZE 512
> +
> +#if defined(CONFIG_VDI_DEBUG)
> +#define logout(fmt, ...) \
> +                fprintf(stderr, "vdi\t%-24s" fmt, __func__, ##__VA_ARGS__)
> +#else
> +#define logout(fmt, ...) ((void)0)
> +#endif
> +
> +/* Image signature. */
> +#define VDI_SIGNATURE 0xbeda107f
> +
> +/* Image version. */
> +#define VDI_VERSION_1_1 0x00010001
> +
> +/* Image type. */
> +#define VDI_TYPE_DYNAMIC 1
> +#define VDI_TYPE_STATIC  2
> +
> +/* Innotek / SUN images use these strings in header.text:
> + * "<<< innotek VirtualBox Disk Image >>>\n"
> + * "<<< Sun xVM VirtualBox Disk Image >>>\n"
> + * "<<< Sun VirtualBox Disk Image >>>\n"
> + * The value does not matter, so QEMU created images use a different text.
> + */
> +#define VDI_TEXT "<<< QEMU VM Virtual Disk Image >>>\n"
> +
> +/* Unallocated blocks use this index (no need to convert endianess). */
> +#define VDI_UNALLOCATED UINT32_MAX
> +
> +#if !defined(HAVE_UUID_H)
> +void uuid_generate(uuid_t out)
> +{
> +    memset(out, 0, sizeof(out));
> +}
> +
> +void uuid_unparse(uuid_t uu, char *out)
> +{
> +    snprintf(out, 37, UUID_FMT,
> +            uu[0], uu[1], uu[2], uu[3], uu[4], uu[5], uu[6], uu[7],
> +            uu[8], uu[9], uu[10], uu[11], uu[12], uu[13], uu[14], uu[15]);
> +}
> +#endif
> +
> +#if defined(CONFIG_AIO)
> +typedef struct {
> +    BlockDriverAIOCB common;
> +    int64_t sector_num;
> +    QEMUIOVector *qiov;
> +    uint8_t *buf;
> +    /* Total number of sectors. */
> +    int nb_sectors;
> +    /* Number of sectors for current AIO. */
> +    int n_sectors;
> +    /* New allocated block map entry. */
> +    uint32_t bmap_first;
> +    uint32_t bmap_last;
> +    /* Buffer for new allocated block. */
> +    void *block_buffer;
> +    void *orig_buf;
> +    int header_modified;
> +    BlockDriverAIOCB *hd_aiocb;
> +    struct iovec hd_iov;
> +    QEMUIOVector hd_qiov;
> +    QEMUBH *bh;
> +} VdiAIOCB;
> +#endif
> +
> +typedef struct {
> +    char text[0x40];
> +    uint32_t signature;
> +    uint32_t version;
> +    uint32_t header_size;
> +    uint32_t image_type;
> +    uint32_t image_flags;
> +    char description[256];
> +    uint32_t offset_bmap;
> +    uint32_t offset_data;
> +    uint32_t cylinders;         /* disk geometry, unused here */
> +    uint32_t heads;             /* disk geometry, unused here */
> +    uint32_t sectors;           /* disk geometry, unused here */

Is the geometry unused by VBox? If not, leaving it unused here is most
probably wrong. At least for image creation you need to fill the fields.

In the case of VHD, the geometry was the really significant thing. Using
the disk size in the header (which was inconsistent with the geometry)
meant that qemu-img convert to raw resulted in a virtual hard disk of
different size. You should check this for VDI.

> +    uint32_t sector_size;
> +    uint32_t unused1;
> +    uint64_t disk_size;
> +    uint32_t block_size;
> +    uint32_t block_extra;       /* unused here */
> +    uint32_t blocks_in_image;
> +    uint32_t blocks_allocated;
> +    uuid_t uuid_image;
> +    uuid_t uuid_last_snap;
> +    uuid_t uuid_link;
> +    uuid_t uuid_parent;
> +    uint64_t unused2[7];
> +} VdiHeader;
> +
> +typedef struct {
> +    BlockDriverState *hd;
> +    /* The block map entries are little endian (even in memory). */
> +    uint32_t *bmap;
> +    /* Size of block (bytes). */
> +    uint32_t block_size;
> +    /* Size of block (sectors). */
> +    uint32_t block_sectors;
> +    /* First sector of block map. */
> +    uint32_t bmap_sector;
> +    /* VDI header (converted to host endianess). */
> +    VdiHeader header;
> +} BDRVVdiState;
> +
> +static void vdi_header_to_cpu(VdiHeader *header)
> +{
> +    le32_to_cpus(&header->signature);
> +    le32_to_cpus(&header->version);
> +    le32_to_cpus(&header->header_size);
> +    le32_to_cpus(&header->image_type);
> +    le32_to_cpus(&header->image_flags);
> +    le32_to_cpus(&header->offset_bmap);
> +    le32_to_cpus(&header->offset_data);
> +    le32_to_cpus(&header->cylinders);
> +    le32_to_cpus(&header->heads);
> +    le32_to_cpus(&header->sectors);
> +    le32_to_cpus(&header->sector_size);
> +    le64_to_cpus(&header->disk_size);
> +    le32_to_cpus(&header->block_size);
> +    le32_to_cpus(&header->block_extra);
> +    le32_to_cpus(&header->blocks_in_image);
> +    le32_to_cpus(&header->blocks_allocated);
> +}
> +
> +static void vdi_header_to_le(VdiHeader *header)
> +{
> +    cpu_to_le32s(&header->signature);
> +    cpu_to_le32s(&header->version);
> +    cpu_to_le32s(&header->header_size);
> +    cpu_to_le32s(&header->image_type);
> +    cpu_to_le32s(&header->image_flags);
> +    cpu_to_le32s(&header->offset_bmap);
> +    cpu_to_le32s(&header->offset_data);
> +    cpu_to_le32s(&header->cylinders);
> +    cpu_to_le32s(&header->heads);
> +    cpu_to_le32s(&header->sectors);
> +    cpu_to_le32s(&header->sector_size);
> +    cpu_to_le64s(&header->disk_size);
> +    cpu_to_le32s(&header->block_size);
> +    cpu_to_le32s(&header->block_extra);
> +    cpu_to_le32s(&header->blocks_in_image);
> +    cpu_to_le32s(&header->blocks_allocated);
> +}
> +
> +#if defined(CONFIG_VDI_DEBUG)
> +static void vdi_header_print(VdiHeader *header)
> +{
> +    char uuid[37];
> +    logout("text        %s", header->text);
> +    logout("signature   0x%04x\n", header->signature);
> +    logout("header size 0x%04x\n", header->header_size);
> +    logout("image type  0x%04x\n", header->image_type);
> +    logout("image flags 0x%04x\n", header->image_flags);
> +    logout("description %s\n", header->description);
> +    logout("offset bmap 0x%04x\n", header->offset_bmap);
> +    logout("offset data 0x%04x\n", header->offset_data);
> +    logout("cylinders   0x%04x\n", header->cylinders);
> +    logout("heads       0x%04x\n", header->heads);
> +    logout("sectors     0x%04x\n", header->sectors);
> +    logout("sector size 0x%04x\n", header->sector_size);
> +    logout("image size  0x%" PRIx64 " B (%" PRIu64 " MiB)\n",
> +           header->disk_size, header->disk_size / MiB);
> +    logout("block size  0x%04x\n", header->block_size);
> +    logout("block extra 0x%04x\n", header->block_extra);
> +    logout("blocks tot. 0x%04x\n", header->blocks_in_image);
> +    logout("blocks all. 0x%04x\n", header->blocks_allocated);
> +    uuid_unparse(header->uuid_image, uuid);
> +    logout("uuid image  %s\n", uuid);
> +    uuid_unparse(header->uuid_last_snap, uuid);
> +    logout("uuid snap   %s\n", uuid);
> +    uuid_unparse(header->uuid_link, uuid);
> +    logout("uuid link   %s\n", uuid);
> +    uuid_unparse(header->uuid_parent, uuid);
> +    logout("uuid parent %s\n", uuid);
> +}
> +#endif
> +
> +static int vdi_check(BlockDriverState *bs)
> +{
> +    /* TODO: additional checks possible. */
> +    BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
> +    int n_errors = 0;
> +    uint32_t blocks_allocated = 0;
> +    uint32_t block;
> +    uint32_t *bmap;
> +    logout("\n");
> +
> +    bmap = qemu_malloc(s->header.blocks_in_image * sizeof(uint32_t));
> +    memset(bmap, 0xff, s->header.blocks_in_image * sizeof(uint32_t));
> +
> +    /* Check block map and value of blocks_allocated. */
> +    for (block = 0; block < s->header.blocks_in_image; block++) {
> +        uint32_t bmap_entry = le32_to_cpu(s->bmap[block]);
> +        if (bmap_entry != VDI_UNALLOCATED) {
> +            if (bmap_entry < s->header.blocks_in_image) {
> +                blocks_allocated++;
> +                if (bmap[bmap_entry] == VDI_UNALLOCATED) {
> +                    bmap[bmap_entry] = bmap_entry;
> +                } else {
> +                    fprintf(stderr, "ERROR: block index %" PRIu32
> +                            " also used by %" PRIu32 "\n", bmap[bmap_entry], bmap_entry);
> +                }
> +            } else {
> +                fprintf(stderr, "ERROR: block index %" PRIu32
> +                        " too large, is %" PRIu32 "\n", block, bmap_entry);
> +                n_errors++;
> +            }
> +        }
> +    }
> +    if (blocks_allocated != s->header.blocks_allocated) {
> +        fprintf(stderr, "ERROR: allocated blocks mismatch, is %" PRIu32
> +               ", should be %" PRIu32 "\n",
> +               blocks_allocated, s->header.blocks_allocated);
> +        n_errors++;
> +    }
> +
> +    qemu_free(bmap);
> +
> +    return n_errors;
> +}
> +
> +static int vdi_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
> +{
> +    /* TODO: vdi_get_info would be needed for machine snapshots.
> +       vm_state_offset is still missing. */
> +    BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
> +    logout("\n");
> +    bdi->cluster_size = s->block_size;
> +    bdi->vm_state_offset = 0;
> +    return 0;
> +}
> +
> +static int vdi_make_empty(BlockDriverState *bs)
> +{
> +    /* TODO: missing code. */
> +    logout("\n");
> +    return 0;
> +}

If you don't implement it, leave it out. Setting
bdrv_vdi.bdrv_make_empty != NULL means that you claim to have that
functionality.

> +
> +static int vdi_probe(const uint8_t *buf, int buf_size, const char *filename)
> +{
> +    const VdiHeader *header = (const VdiHeader *)buf;
> +    int result = 0;
> +
> +    logout("\n");
> +
> +    if (buf_size < sizeof(*header)) {
> +        /* Header too small, no VDI. */
> +    } else if (le32_to_cpu(header->signature) == VDI_SIGNATURE) {
> +        result = 100;
> +    }
> +
> +    if (result == 0) {
> +        logout("no vdi image\n");
> +    } else {
> +        logout("%s", header->text);
> +    }
> +
> +    return result;
> +}
> +
> +#if defined(CONFIG_VDI_SNAPSHOT)
> +static int vdi_snapshot_create(const char *filename, const char *backing_file)
> +{
> +    /* TODO: missing code. */
> +    logout("\n");
> +    return -1;
> +}
> +#endif

I don't like such stubs. But at least they are guarded by #ifdef here...

> +
> +static int vdi_open(BlockDriverState *bs, const char *filename, int flags)
> +{
> +    BDRVVdiState *s = bs->opaque;
> +    VdiHeader header;
> +    size_t bmap_size;
> +    int ret;
> +
> +    logout("\n");
> +
> +    ret = bdrv_file_open(&s->hd, filename, flags);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    if (bdrv_read(s->hd, 0, (uint8_t *)&header, 1) < 0) {
> +        goto fail;
> +    }
> +
> +    vdi_header_to_cpu(&header);
> +#if defined(CONFIG_VDI_DEBUG)
> +    vdi_header_print(&header);
> +#endif
> +
> +    if (header.version != VDI_VERSION_1_1) {
> +        logout("unsupported version %u.%u\n",
> +               header.version >> 16, header.version & 0xffff);
> +        goto fail;
> +    } else if (header.offset_bmap % SECTOR_SIZE != 0) {
> +        /* We only support block maps which start on a sector boundary. */
> +        logout("unsupported block map offset 0x%x B\n", header.offset_bmap);
> +        goto fail;
> +    } else if (header.offset_data % SECTOR_SIZE != 0) {
> +        /* We only support data blocks which start on a sector boundary. */
> +        logout("unsupported data offset 0x%x B\n", header.offset_data);
> +        goto fail;
> +    } else if (header.sector_size != SECTOR_SIZE) {
> +        logout("unsupported sector size %u B\n", header.sector_size);
> +        goto fail;
> +    } else if (header.block_size != 1 * MiB) {
> +        logout("unsupported block size %u B\n", header.block_size);
> +        goto fail;
> +    } else if (header.disk_size !=
> +               (uint64_t)header.blocks_in_image * header.block_size) {
> +        logout("unexpected block number %u B\n", header.blocks_in_image);
> +        goto fail;
> +    }
> +
> +    bs->total_sectors = header.disk_size / SECTOR_SIZE;
> +
> +    s->block_size = header.block_size;
> +    s->block_sectors = header.block_size / SECTOR_SIZE;
> +    s->bmap_sector = header.offset_bmap / SECTOR_SIZE;
> +    s->header = header;
> +
> +    bmap_size = header.blocks_in_image * sizeof(uint32_t);
> +    s->bmap = qemu_malloc(bmap_size);
> +    if (bdrv_read(s->hd, s->bmap_sector,
> +                  (uint8_t *)s->bmap, bmap_size / SECTOR_SIZE) < 0) {
> +        goto fail_free_bmap;
> +    }
> +
> +    return 0;
> +
> + fail_free_bmap:
> +    qemu_free(s->bmap);
> +
> + fail:
> +    bdrv_delete(s->hd);
> +    return -1;
> +}
> +
> +static int vdi_is_allocated(BlockDriverState *bs, int64_t sector_num,
> +                             int nb_sectors, int *pnum)
> +{
> +    /* TODO: Check for too large sector_num (in bdrv_is_allocated or here). */
> +    BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
> +    size_t bmap_index = sector_num / s->block_sectors;
> +    size_t sector_in_block = sector_num % s->block_sectors;
> +    int n_sectors = s->block_sectors - sector_in_block;
> +    uint32_t bmap_entry = le32_to_cpu(s->bmap[bmap_index]);
> +    logout("%p, %" PRId64 ", %d, %p\n", bs, sector_num, nb_sectors, pnum);
> +    if (n_sectors > nb_sectors) {
> +        n_sectors = nb_sectors;
> +    }
> +    *pnum = n_sectors;
> +    return bmap_entry != VDI_UNALLOCATED;
> +}
> +
> +#if defined(CONFIG_AIO)
> +
> +#if 0

I guess you should remove this block before the patch is included.

> +static void vdi_aio_remove(VdiAIOCB *acb)
> +{
> +    logout("\n");
> +#if 0
> +    VdiAIOCB **pacb;
> +
> +    /* remove the callback from the queue */
> +    pacb = &posix_aio_state->first_aio;
> +    for(;;) {
> +        if (*pacb == NULL) {
> +            fprintf(stderr, "vdi_aio_remove: aio request not found!\n");
> +            break;
> +        } else if (*pacb == acb) {
> +            *pacb = acb->next;
> +            qemu_aio_release(acb);
> +            break;
> +        }
> +        pacb = &(*pacb)->next;
> +    }
> +#endif
> +}
> +#endif
> +
> +static void vdi_aio_cancel(BlockDriverAIOCB *blockacb)
> +{
> +    logout("\n");
> +
> +#if 0
> +    int ret;
> +    VdiAIOCB *acb = (VdiAIOCB *)blockacb;
> +
> +    ret = qemu_paio_cancel(acb->aiocb.aio_fildes, &acb->aiocb);
> +    if (ret == QEMU_PAIO_NOTCANCELED) {
> +        /* fail safe: if the aio could not be canceled, we wait for
> +           it */
> +        while (qemu_paio_error(&acb->aiocb) == EINPROGRESS);
> +    }
> +
> +    vdi_aio_remove(acb);
> +#endif
> +}
> +
> +static AIOPool vdi_aio_pool = {
> +    .aiocb_size = sizeof(VdiAIOCB),
> +    .cancel = vdi_aio_cancel,
> +};
> +
> +static VdiAIOCB *vdi_aio_setup(BlockDriverState *bs, int64_t sector_num,
> +        QEMUIOVector *qiov, int nb_sectors,
> +        BlockDriverCompletionFunc *cb, void *opaque, int is_write)
> +{
> +    VdiAIOCB *acb;
> +
> +    logout("%p, %" PRId64 ", %p, %d, %p, %p, %d\n",
> +           bs, sector_num, qiov, nb_sectors, cb, opaque, is_write);
> +
> +    acb = qemu_aio_get(&vdi_aio_pool, bs, cb, opaque);
> +    if (acb) {
> +        acb->hd_aiocb = NULL;
> +        acb->sector_num = sector_num;
> +        acb->qiov = qiov;
> +        if (qiov->niov > 1) {
> +            acb->buf = qemu_blockalign(bs, qiov->size);
> +            acb->orig_buf = acb->buf;
> +            if (is_write) {
> +                qemu_iovec_to_buffer(qiov, acb->buf);
> +            }
> +        } else {
> +            acb->buf = (uint8_t *)qiov->iov->iov_base;
> +        }
> +        acb->nb_sectors = nb_sectors;
> +        acb->n_sectors = 0;
> +        acb->bmap_first = VDI_UNALLOCATED;
> +        acb->bmap_last = VDI_UNALLOCATED;
> +        acb->block_buffer = NULL;
> +        acb->header_modified = 0;
> +    }
> +    return acb;
> +}
> +
> +static int vdi_schedule_bh(QEMUBHFunc *cb, VdiAIOCB *acb)
> +{
> +    logout("\n");
> +
> +    if (acb->bh) {
> +        return -EIO;
> +    }
> +
> +    acb->bh = qemu_bh_new(cb, acb);
> +    if (!acb->bh) {
> +        return -EIO;
> +    }
> +
> +    qemu_bh_schedule(acb->bh);
> +
> +    return 0;
> +}
> +
> +static void vdi_aio_read_cb(void *opaque, int ret);
> +
> +static void vdi_aio_read_bh(void *opaque)
> +{
> +    VdiAIOCB *acb = opaque;
> +    logout("\n");
> +    qemu_bh_delete(acb->bh);
> +    acb->bh = NULL;
> +    vdi_aio_read_cb(opaque, 0);
> +}
> +
> +static void vdi_aio_read_cb(void *opaque, int ret)
> +{
> +    VdiAIOCB *acb = opaque;
> +    BlockDriverState *bs = acb->common.bs;
> +    BDRVVdiState *s = bs->opaque;
> +    uint32_t bmap_entry;
> +    uint32_t block_index;
> +    uint32_t sector_in_block;
> +    uint32_t n_sectors;
> +
> +    logout("%u sectors read\n", acb->n_sectors);
> +
> +    acb->hd_aiocb = NULL;
> +
> +    if (ret < 0) {
> +        goto done;
> +    }
> +
> +    acb->nb_sectors -= acb->n_sectors;
> +
> +    if (acb->nb_sectors == 0) {
> +        /* request completed */
> +        ret = 0;
> +        goto done;
> +    }
> +
> +    acb->sector_num += acb->n_sectors;
> +    acb->buf += acb->n_sectors * SECTOR_SIZE;
> +
> +    block_index = acb->sector_num / s->block_sectors;
> +    sector_in_block = acb->sector_num % s->block_sectors;
> +    n_sectors = s->block_sectors - sector_in_block;
> +    if (n_sectors > acb->nb_sectors) {
> +        n_sectors = acb->nb_sectors;
> +    }
> +
> +    logout("will read %u sectors starting at sector %" PRIu64 "\n",
> +           n_sectors, acb->sector_num);
> +
> +    /* prepare next AIO request */
> +    acb->n_sectors = n_sectors;
> +    bmap_entry = le32_to_cpu(s->bmap[block_index]);
> +    if (bmap_entry == VDI_UNALLOCATED) {
> +        /* Block not allocated, return zeros, no need to wait. */
> +        memset(acb->buf, 0, n_sectors * SECTOR_SIZE);
> +        ret = vdi_schedule_bh(vdi_aio_read_bh, acb);
> +        if (ret < 0) {
> +            goto done;
> +        }
> +    } else {
> +        uint64_t offset = s->header.offset_data / SECTOR_SIZE +
> +                          (uint64_t)bmap_entry * s->block_sectors +
> +                          sector_in_block;
> +        acb->hd_iov.iov_base = (void *)acb->buf;
> +        acb->hd_iov.iov_len = n_sectors * SECTOR_SIZE;
> +        qemu_iovec_init_external(&acb->hd_qiov, &acb->hd_iov, 1);
> +        acb->hd_aiocb = bdrv_aio_readv(s->hd, offset, &acb->hd_qiov,
> +                                       n_sectors, vdi_aio_read_cb, acb);
> +        if (acb->hd_aiocb == NULL) {
> +            goto done;
> +        }
> +    }
> +    return;
> +done:
> +    if (acb->qiov->niov > 1) {
> +        qemu_iovec_from_buffer(acb->qiov, acb->orig_buf, acb->qiov->size);
> +        qemu_vfree(acb->orig_buf);
> +    }
> +    acb->common.cb(acb->common.opaque, ret);
> +    qemu_aio_release(acb);
> +}
> +
> +static BlockDriverAIOCB *vdi_aio_readv(BlockDriverState *bs,
> +        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
> +        BlockDriverCompletionFunc *cb, void *opaque)
> +{
> +    VdiAIOCB *acb;
> +    logout("\n");
> +    acb = vdi_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, 0);
> +    if (!acb) {
> +        return NULL;
> +    }
> +    vdi_aio_read_cb(acb, 0);
> +    return &acb->common;
> +}
> +
> +static void vdi_aio_write_cb(void *opaque, int ret)
> +{
> +    VdiAIOCB *acb = opaque;
> +    BlockDriverState *bs = acb->common.bs;
> +    BDRVVdiState *s = bs->opaque;
> +    uint32_t bmap_entry;
> +    uint32_t block_index;
> +    uint32_t sector_in_block;
> +    uint32_t n_sectors;
> +
> +    acb->hd_aiocb = NULL;
> +
> +    if (ret < 0) {
> +        goto done;
> +    }
> +
> +    acb->nb_sectors -= acb->n_sectors;
> +    acb->sector_num += acb->n_sectors;
> +    acb->buf += acb->n_sectors * SECTOR_SIZE;
> +
> +    if (acb->nb_sectors == 0) {
> +        logout("finished data write\n");
> +        acb->n_sectors = 0;
> +        if (acb->header_modified) {
> +            VdiHeader *header = acb->block_buffer;
> +            logout("now writing modified header\n");
> +            assert(acb->bmap_first != VDI_UNALLOCATED);
> +            *header = s->header;
> +            vdi_header_to_le(header);
> +            acb->header_modified = 0;
> +            acb->hd_iov.iov_base = acb->block_buffer;
> +            acb->hd_iov.iov_len = SECTOR_SIZE;
> +            qemu_iovec_init_external(&acb->hd_qiov, &acb->hd_iov, 1);
> +            acb->hd_aiocb = bdrv_aio_writev(s->hd, 0, &acb->hd_qiov, 1,
> +                                            vdi_aio_write_cb, acb);
> +            if (acb->hd_aiocb == NULL) {
> +                goto done;
> +            }
> +            return;
> +        } else if (acb->bmap_first != VDI_UNALLOCATED) {
> +            /* One or more new blocks were allocated. */
> +            uint64_t offset;
> +            uint32_t bmap_first;
> +            uint32_t bmap_last;
> +            qemu_free(acb->block_buffer);
> +            acb->block_buffer = NULL;
> +            bmap_first = acb->bmap_first;
> +            bmap_last = acb->bmap_last;
> +            logout("now writing modified block map entry %u...%u\n",
> +                   bmap_first, bmap_last);
> +            /* Write modified sectors from block map. */
> +            bmap_first /= (SECTOR_SIZE / sizeof(uint32_t));
> +            bmap_last /= (SECTOR_SIZE / sizeof(uint32_t));
> +            n_sectors = bmap_last - bmap_first + 1;
> +            offset = s->bmap_sector + bmap_first;
> +            acb->bmap_first = VDI_UNALLOCATED;
> +            acb->hd_iov.iov_base = (uint8_t *)&s->bmap[0] +
> +                                   bmap_first * SECTOR_SIZE;
> +            acb->hd_iov.iov_len = n_sectors * SECTOR_SIZE;
> +            qemu_iovec_init_external(&acb->hd_qiov, &acb->hd_iov, 1);
> +            logout("will write %u block map sectors starting from entry %u\n",
> +                   n_sectors, bmap_first);
> +            acb->hd_aiocb = bdrv_aio_writev(s->hd, offset, &acb->hd_qiov,
> +                                            n_sectors, vdi_aio_write_cb, acb);
> +            if (acb->hd_aiocb == NULL) {
> +                goto done;
> +            }
> +            return;
> +        }
> +        ret = 0;
> +        goto done;
> +    }
> +
> +    logout("%u sectors written\n", acb->n_sectors);
> +
> +    block_index = acb->sector_num / s->block_sectors;
> +    sector_in_block = acb->sector_num % s->block_sectors;
> +    n_sectors = s->block_sectors - sector_in_block;
> +    if (n_sectors > acb->nb_sectors) {
> +        n_sectors = acb->nb_sectors;
> +    }
> +
> +    logout("will write %u sectors starting at sector %" PRIu64 "\n",
> +           n_sectors, acb->sector_num);
> +
> +    /* prepare next AIO request */
> +    acb->n_sectors = n_sectors;
> +    bmap_entry = le32_to_cpu(s->bmap[block_index]);
> +    if (bmap_entry == VDI_UNALLOCATED) {
> +        /* Allocate new block and write to it. */
> +        uint64_t offset;
> +        uint8_t *block;
> +        bmap_entry = s->header.blocks_allocated;
> +        s->bmap[block_index] = cpu_to_le32(bmap_entry);
> +        s->header.blocks_allocated++;
> +        offset = s->header.offset_data / SECTOR_SIZE +
> +                 (uint64_t)bmap_entry * s->block_sectors;
> +        block = acb->block_buffer;
> +        if (block == NULL) {
> +            block = qemu_mallocz(s->block_size);
> +            acb->block_buffer = block;
> +            acb->bmap_first = block_index;
> +            assert(!acb->header_modified);
> +            acb->header_modified = 1;
> +        }
> +        acb->bmap_last = block_index;
> +        memcpy(block + sector_in_block * SECTOR_SIZE,
> +               acb->buf, n_sectors * SECTOR_SIZE);
> +        acb->hd_iov.iov_base = block;
> +        acb->hd_iov.iov_len = s->block_size;
> +        qemu_iovec_init_external(&acb->hd_qiov, &acb->hd_iov, 1);
> +        acb->hd_aiocb = bdrv_aio_writev(s->hd, offset,
> +                                        &acb->hd_qiov, s->block_sectors,
> +                                        vdi_aio_write_cb, acb);
> +        if (acb->hd_aiocb == NULL) {
> +            goto done;
> +        }
> +    } else {
> +        uint64_t offset = s->header.offset_data / SECTOR_SIZE +
> +                          (uint64_t)bmap_entry * s->block_sectors +
> +                          sector_in_block;
> +        acb->hd_iov.iov_base = acb->buf;
> +        acb->hd_iov.iov_len = n_sectors * SECTOR_SIZE;
> +        qemu_iovec_init_external(&acb->hd_qiov, &acb->hd_iov, 1);
> +        acb->hd_aiocb = bdrv_aio_writev(s->hd, offset, &acb->hd_qiov,
> +                                        n_sectors, vdi_aio_write_cb, acb);
> +        if (acb->hd_aiocb == NULL) {
> +            goto done;
> +        }
> +    }
> +
> +    return;
> +
> +done:
> +    if (acb->qiov->niov > 1) {
> +        qemu_vfree(acb->orig_buf);
> +    }
> +    acb->common.cb(acb->common.opaque, ret);
> +    qemu_aio_release(acb);
> +}
> +
> +static BlockDriverAIOCB *vdi_aio_writev(BlockDriverState *bs,
> +        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
> +        BlockDriverCompletionFunc *cb, void *opaque)
> +{
> +    VdiAIOCB *acb;
> +    logout("\n");
> +    acb = vdi_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, 1);
> +    if (!acb) {
> +        return NULL;
> +    }
> +    vdi_aio_write_cb(acb, 0);
> +    return &acb->common;
> +}
> +
> +#else /* CONFIG_AIO */

No reason to retain the old code. It's just duplicated code that is
disabled by default and therefore likely to break soon.

> +
> +static int vdi_read(BlockDriverState *bs, int64_t sector_num,
> +                    uint8_t *buf, int nb_sectors)
> +{
> +    BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
> +    logout("%p, %" PRId64 ", %p, %d\n", bs, sector_num, buf, nb_sectors);
> +    if (sector_num < 0) {
> +        logout("unsupported sector %" PRId64 "\n", sector_num);
> +        return -1;
> +    }
> +    while (nb_sectors > 0 && sector_num < bs->total_sectors) {
> +        uint32_t bmap_entry;
> +        size_t block_index = sector_num / s->block_sectors;
> +        size_t sector_in_block = sector_num % s->block_sectors;
> +        size_t n_sectors = s->block_sectors - sector_in_block;
> +        if (n_sectors > nb_sectors) {
> +            n_sectors = nb_sectors;
> +        }
> +        bmap_entry = le32_to_cpu(s->bmap[block_index]);
> +        if (bmap_entry == VDI_UNALLOCATED) {
> +            /* Block not allocated, return zeros. */
> +            memset(buf, 0, n_sectors * SECTOR_SIZE);
> +        } else {
> +            uint64_t offset = s->header.offset_data / SECTOR_SIZE +
> +                (uint64_t)bmap_entry * s->block_sectors + sector_in_block;
> +            if (bdrv_read(s->hd, offset, buf, n_sectors) < 0) {
> +                logout("read error\n");
> +                return -1;
> +            }
> +        }
> +        buf += n_sectors * SECTOR_SIZE;
> +        sector_num += n_sectors;
> +        nb_sectors -= n_sectors;
> +    }
> +    return 0;
> +}
> +
> +#if defined(CONFIG_VDI_WRITE)
> +static int vdi_write(BlockDriverState *bs, int64_t sector_num,
> +                     const uint8_t *buf, int nb_sectors)
> +{
> +    BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
> +    logout("%p, %" PRId64 ", %p, %d\n", bs, sector_num, buf, nb_sectors);
> +    if (sector_num < 0) {
> +        logout("unsupported sector %" PRId64 "\n", sector_num);
> +        return -1;
> +    }
> +    while (nb_sectors > 0 && sector_num < bs->total_sectors) {
> +        uint32_t bmap_entry;
> +        uint64_t offset;
> +        size_t block_index = sector_num / s->block_sectors;
> +        size_t sector_in_block = sector_num % s->block_sectors;
> +        size_t n_sectors = s->block_sectors - sector_in_block;
> +        if (n_sectors > nb_sectors) {
> +            n_sectors = nb_sectors;
> +        }
> +        bmap_entry = le32_to_cpu(s->bmap[block_index]);
> +        if (bmap_entry == VDI_UNALLOCATED) {
> +            /* Allocate new block and write to it. */
> +            VdiHeader header;
> +            uint8_t *block;
> +            bmap_entry = s->header.blocks_allocated;
> +            s->bmap[block_index] = cpu_to_le32(bmap_entry);
> +            s->header.blocks_allocated++;
> +            offset = s->header.offset_data / SECTOR_SIZE +
> +                     (uint64_t)bmap_entry * s->block_sectors;
> +            block = qemu_mallocz(s->block_size);
> +            memcpy(block + sector_in_block * SECTOR_SIZE,
> +                   buf, n_sectors * SECTOR_SIZE);
> +            if (bdrv_write(s->hd, offset, block, s->block_sectors) < 0) {
> +                qemu_free(block);
> +                logout("write error\n");
> +                return -1;
> +            }
> +            qemu_free(block);
> +
> +            /* Write modified sector from block map. */
> +            block_index /= (SECTOR_SIZE / sizeof(uint32_t));
> +            offset = s->bmap_sector + block_index;
> +            if (bdrv_write(s->hd, offset,
> +                           (uint8_t *)&s->bmap[bmap_entry], 1) < 0) {
> +                logout("write error\n");
> +                return -1;
> +            }
> +
> +            /* Write modified header (blocks_allocated). */
> +            header = s->header;
> +            vdi_header_to_le(&header);
> +            if (bdrv_write(s->hd, 0, (uint8_t *)&header, 1) < 0) {
> +                logout("write error\n");
> +                return -1;
> +            }
> +        } else {
> +            /* Write to existing block. */
> +            offset = s->header.offset_data / SECTOR_SIZE +
> +                (uint64_t)bmap_entry * s->block_sectors +
> +                sector_in_block;
> +            if (bdrv_write(s->hd, offset, buf, n_sectors) < 0) {
> +                logout("write error\n");
> +                return -1;
> +            }
> +        }
> +        buf += n_sectors * SECTOR_SIZE;
> +        sector_num += n_sectors;
> +        nb_sectors -= n_sectors;
> +    }
> +    return 0;
> +}
> +#endif /* CONFIG_VDI_WRITE */
> +
> +#endif /* CONFIG_AIO */
> +
> +static int vdi_create(const char *filename, QEMUOptionParameter *options)
> +{
> +    /* TODO: Support pre-allocated images. */
> +    int fd;
> +    int result = 0;
> +    uint64_t bytes = 0;
> +    uint32_t blocks;
> +    size_t block_size = 1 * MiB;
> +    uint32_t image_type = VDI_TYPE_DYNAMIC;
> +    VdiHeader header;
> +    size_t i;
> +    size_t bmap_size;
> +    uint32_t *bmap;
> +
> +    logout("\n");
> +
> +    /* Read out options. */
> +    while (options && options->name) {
> +        if (!strcmp(options->name, BLOCK_OPT_SIZE)) {
> +            bytes = options->value.n;
> +#if defined(CONFIG_VDI_BLOCK_SIZE)
> +        } else if (!strcmp(options->name, BLOCK_OPT_CLUSTER_SIZE)) {
> +            if (options->value.n) {
> +                /* TODO: Additional checks (SECTOR_SIZE * 2^n, ...). */
> +                block_size = options->value.n;
> +            }
> +#endif
> +#if defined(CONFIG_VDI_STATIC_IMAGE)
> +        } else if (!strcmp(options->name, BLOCK_OPT_STATIC)) {
> +            image_type = VDI_TYPE_STATIC;
> +#endif
> +        }
> +        options++;
> +    }
> +
> +    fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE,
> +              0644);
> +    if (fd < 0) {
> +        return -errno;
> +    }
> +
> +    blocks = bytes / block_size;
> +    bmap_size = blocks * sizeof(uint32_t);
> +    bmap_size = ((bmap_size + SECTOR_SIZE - 1) & ~(SECTOR_SIZE -1));
> +
> +    memset(&header, 0, sizeof(header));
> +    strcpy(header.text, VDI_TEXT);
> +    header.signature = VDI_SIGNATURE;
> +    header.version = VDI_VERSION_1_1;
> +    header.header_size = 0x180;
> +    header.image_type = image_type;
> +    header.offset_bmap = 0x200;
> +    header.offset_data = 0x200 + bmap_size;
> +    header.sector_size = SECTOR_SIZE;
> +    header.disk_size = bytes;
> +    header.block_size = block_size;
> +    header.blocks_in_image = blocks;
> +    uuid_generate(header.uuid_image);
> +    uuid_generate(header.uuid_last_snap);
> +#if 0
> +    uuid_generate(header.uuid_link);
> +    uuid_generate(header.uuid_parent);
> +#endif
> +#if defined(CONFIG_VDI_DEBUG)
> +    vdi_header_print(&header);
> +#endif
> +    vdi_header_to_le(&header);
> +    if (write(fd, &header, sizeof(header)) < 0) {
> +        result = -errno;
> +    }
> +
> +    bmap = (uint32_t *)qemu_mallocz(bmap_size);
> +    for (i = 0; i < blocks; i++) {
> +        bmap[i] = VDI_UNALLOCATED;
> +    }
> +    if (write(fd, bmap, bmap_size) < 0) {
> +        result = -errno;
> +    }
> +    qemu_free(bmap);
> +
> +    if (close(fd) < 0) {
> +        result = -errno;
> +    }
> +
> +    return result;
> +}
> +
> +static void vdi_close(BlockDriverState *bs)
> +{
> +    BDRVVdiState *s = bs->opaque;
> +    logout("\n");
> +    bdrv_delete(s->hd);
> +}
> +
> +static void vdi_flush(BlockDriverState *bs)
> +{
> +    BDRVVdiState *s = bs->opaque;
> +    logout("\n");
> +    bdrv_flush(s->hd);
> +}
> +
> +
> +static QEMUOptionParameter vdi_create_options[] = {
> +    {
> +        .name = BLOCK_OPT_SIZE,
> +        .type = OPT_SIZE,
> +        .help = "Virtual disk size"
> +    },
> +#if defined(CONFIG_VDI_BLOCK_SIZE)
> +    {
> +        .name = BLOCK_OPT_CLUSTER_SIZE,
> +        .type = OPT_SIZE,
> +        .help = "VDI cluster (block) size"
> +    },
> +#endif
> +#if defined(CONFIG_VDI_STATIC_IMAGE)
> +    {
> +        .name = BLOCK_OPT_STATIC,
> +        .type = OPT_FLAG,
> +        .help = "VDI static (pre-allocated) image"
> +    },
> +#endif
> +    { NULL }
> +};
> +
> +static BlockDriver bdrv_vdi = {
> +    .format_name = "vdi",
> +    .instance_size = sizeof(BDRVVdiState),
> +    .bdrv_probe = vdi_probe,
> +    .bdrv_open = vdi_open,
> +    .bdrv_close = vdi_close,
> +    .bdrv_create = vdi_create,
> +    .bdrv_flush = vdi_flush,
> +#if defined(CONFIG_VDI_UNSUPPORTED)
> +    .bdrv_getlength = vdi_getlength,
> +#endif
> +    .bdrv_is_allocated = vdi_is_allocated,
> +#if defined(CONFIG_VDI_UNSUPPORTED)
> +    .bdrv_set_key = vdi_set_key,
> +#endif
> +    .bdrv_make_empty = vdi_make_empty,
> +
> +#ifdef CONFIG_AIO
> +    .bdrv_aio_readv = vdi_aio_readv,
> +#if defined(CONFIG_VDI_WRITE)
> +    .bdrv_aio_writev = vdi_aio_writev,
> +#endif
> +#else
> +    .bdrv_read = vdi_read,
> +#if defined(CONFIG_VDI_WRITE)
> +    .bdrv_write = vdi_write,
> +#endif
> +#endif
> +
> +#if defined(CONFIG_VDI_UNSUPPORTED)
> +    .bdrv_write_compressed = vdi_write_compressed,
> +#endif

Does VDI support compression even theoretically?

> +
> +#if defined(CONFIG_VDI_SNAPSHOT)
> +    .bdrv_snapshot_create = vdi_snapshot_create,
> +    .bdrv_snapshot_goto = vdi_snapshot_goto,
> +    .bdrv_snapshot_delete = vdi_snapshot_delete,
> +    .bdrv_snapshot_list = vdi_snapshot_list,
> +#endif
> +    .bdrv_get_info = vdi_get_info,
> +
> +#if defined(CONFIG_VDI_UNSUPPORTED)
> +    .bdrv_put_buffer = vdi_put_buffer,
> +    .bdrv_get_buffer = vdi_get_buffer,
> +#endif
> +
> +    .create_options = vdi_create_options,
> +    .bdrv_check = vdi_check,
> +};
> +
> +static void bdrv_vdi_init(void)
> +{
> +    logout("\n");
> +    bdrv_register(&bdrv_vdi);
> +}
> +
> +block_init(bdrv_vdi_init);
> diff --git a/qemu-img.texi b/qemu-img.texi
> index 49d4e59..69e24b5 100644
> --- a/qemu-img.texi
> +++ b/qemu-img.texi
> @@ -48,6 +48,8 @@ Old QEMU image format. Left for compatibility.
>  User Mode Linux Copy On Write image format. Used to be the only growable
>  image format in QEMU. It is supported only for compatibility with
>  previous versions. It does not work on win32.
> +@item vdi
> +VirtualBox 1.1 compatible image format.
>  @item vmdk
>  VMware 3 and 4 compatible image format.
>  @item cloop

The actual read/write code looks good to me. However, I haven't tested
it, this is just by reading the code.

Kevin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] Add new block driver for the VDI format (use aio)
  2009-07-24  9:18     ` Kevin Wolf
@ 2009-07-24 16:20       ` Stefan Weil
  2009-07-27  8:00         ` Kevin Wolf
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Weil @ 2009-07-24 16:20 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: QEMU Developers

Kevin Wolf schrieb:
> Stefan Weil schrieb:
>   
>> This is a new block driver written from scratch
>> to support the VDI format in QEMU.
>>
>> VDI is the native format used by Innotek / SUN VirtualBox.
>>
>> Signed-off-by: Stefan Weil <weil@mail.berlios.de>
>> ---
>>  Makefile      |    2 +-
>>  block/vdi.c   | 1105 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  qemu-img.texi |    2 +
>>  3 files changed, 1108 insertions(+), 1 deletions(-)
>>  create mode 100644 block/vdi.c
>>
>>     
...
>> + *
>> + * The driver keeps a block cache (little endian entries) in memory.
>> + * For the standard block size (1 MiB), a terrabyte disk will use 4 MiB RAM,
>>     
>
> Tera, not terra. ;-)
>   
Thanks. I'll replace it by "a 1 TiB disk" :-)

...
>> +/* Enable (currently) unsupported features (not implemented yet). */
>> +//~ #define CONFIG_VDI_UNSUPPORTED
>> +
>> +/* Support non-standard block (cluster) size. */
>> +//~ #define CONFIG_VDI_BLOCK_SIZE
>>     
>
> Actually, this is only about support for image creation. Any reason why
> we shouldn't support creating images with non-standard block sizes? The
> code already supports opening such images unconditionally, so the only
> effect of turning it off for image creation is that we can't test that
> functionality in qemu-iotests.
>
> [Oh, sorry, actually there is a check in open which I missed at first.
> Any reason why we can't support it? But it's consistent at least.]
>   

Multiples of 512 (SECTOR_SIZE) might work.

VirtualBox uses 1 MiB blocks, and I did not see options to create images
with different block sizes. Maybe they even don't support such images.
So I did not spend the time to test other block sizes.
Why implement things nobody needs?

>   
>> +/* Support static (pre-allocated) images. */
>> +#define CONFIG_VDI_STATIC_IMAGE
>> +
>> +/* Command line option for static images. */
>> +#define BLOCK_OPT_STATIC "static"
>>     
>
> What about calling it "preallocate" and moving it to block_int.h? I
> think this could make sense for other drivers, too.
>   

Yes, this would be reasonable if we had more drivers with support
for "preallocate".

The VDI documentation calls these images "static", and they prefer
dynamic images, so this static option is not really very important.


...

>   
>> +
>> +typedef struct {
>> +    char text[0x40];
>> +    uint32_t signature;
>> +    uint32_t version;
>> +    uint32_t header_size;
>> +    uint32_t image_type;
>> +    uint32_t image_flags;
>> +    char description[256];
>> +    uint32_t offset_bmap;
>> +    uint32_t offset_data;
>> +    uint32_t cylinders;         /* disk geometry, unused here */
>> +    uint32_t heads;             /* disk geometry, unused here */
>> +    uint32_t sectors;           /* disk geometry, unused here */
>>     
>
> Is the geometry unused by VBox? If not, leaving it unused here is most
> probably wrong. At least for image creation you need to fill the fields.
>
> In the case of VHD, the geometry was the really significant thing. Using
> the disk size in the header (which was inconsistent with the geometry)
> meant that qemu-img convert to raw resulted in a virtual hard disk of
> different size. You should check this for VDI.
>
>   

VirtualBox sets these values to zero, so my code does this, too.
They are unused, so neither QEMU nor the client can see them.

...

>> +
>> +static int vdi_make_empty(BlockDriverState *bs)
>> +{
>> +    /* TODO: missing code. */
>> +    logout("\n");
>> +    return 0;
>> +}
>>     
>
> If you don't implement it, leave it out. Setting
> bdrv_vdi.bdrv_make_empty != NULL means that you claim to have that
> functionality.
>
>   

I did not analyse what *_make_empty is supposed to do.
This is one of the details were hints of the block driver experts
would be helpful.

>> +
>> +static int vdi_probe(const uint8_t *buf, int buf_size, const char *filename)
>> +{
>> +    const VdiHeader *header = (const VdiHeader *)buf;
>> +    int result = 0;
>> +
>> +    logout("\n");
>> +
>> +    if (buf_size < sizeof(*header)) {
>> +        /* Header too small, no VDI. */
>> +    } else if (le32_to_cpu(header->signature) == VDI_SIGNATURE) {
>> +        result = 100;
>> +    }
>> +
>> +    if (result == 0) {
>> +        logout("no vdi image\n");
>> +    } else {
>> +        logout("%s", header->text);
>> +    }
>> +
>> +    return result;
>> +}
>> +
>> +#if defined(CONFIG_VDI_SNAPSHOT)
>> +static int vdi_snapshot_create(const char *filename, const char *backing_file)
>> +{
>> +    /* TODO: missing code. */
>> +    logout("\n");
>> +    return -1;
>> +}
>> +#endif
>>     
>
> I don't like such stubs. But at least they are guarded by #ifdef here...
>
>   
...

>> +
>> +#if defined(CONFIG_AIO)
>> +
>> +#if 0
>>     
>
> I guess you should remove this block before the patch is included.
>   

This is also one of the details were hints of the block driver experts
would be helpful as I did not understand this aio_remove / aio_cancel
mechanism.


...

>> +#else /* CONFIG_AIO */
>>     
>
> No reason to retain the old code. It's just duplicated code that is
> disabled by default and therefore likely to break soon.
>   

I agree. As soon as the rest is ok, this part of the code will be removed.

>   
>> +
>> +static int vdi_read(BlockDriverState *bs, int64_t sector_num,
>> +                    uint8_t *buf, int nb_sectors)
>> +{
>> +    BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
>> +    logout("%p, %" PRId64 ", %p, %d\n", bs, sector_num, buf, nb_sectors);
>> +    if (sector_num < 0) {
>> +        logout("unsupported sector %" PRId64 "\n", sector_num);
>> +        return -1;
>> +    }
>> +    while (nb_sectors > 0 && sector_num < bs->total_sectors) {
>> +        uint32_t bmap_entry;
>> +        size_t block_index = sector_num / s->block_sectors;
>> +        size_t sector_in_block = sector_num % s->block_sectors;
>> +        size_t n_sectors = s->block_sectors - sector_in_block;
>> +        if (n_sectors > nb_sectors) {
>> +            n_sectors = nb_sectors;
>> +        }
>> +        bmap_entry = le32_to_cpu(s->bmap[block_index]);
>> +        if (bmap_entry == VDI_UNALLOCATED) {
>> +            /* Block not allocated, return zeros. */
>> +            memset(buf, 0, n_sectors * SECTOR_SIZE);
>> +        } else {
>> +            uint64_t offset = s->header.offset_data / SECTOR_SIZE +
>> +                (uint64_t)bmap_entry * s->block_sectors + sector_in_block;
>> +            if (bdrv_read(s->hd, offset, buf, n_sectors) < 0) {
>> +                logout("read error\n");
>> +
>>     
>   
...

> Does VDI support compression even theoretically?
>   

I think it would be possible to extend the specification
to support compression or encryption.

The official specification (as far as I know it) does not
support compression (nor encryption).

...
>>  @item vmdk
>>  VMware 3 and 4 compatible image format.
>>  @item cloop
>>     
>
> The actual read/write code looks good to me. However, I haven't tested
> it, this is just by reading the code.
>
> Kevin
>
>   


Thank you for this review.

Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] Add new block driver for the VDI format (use aio)
  2009-07-24 16:20       ` Stefan Weil
@ 2009-07-27  8:00         ` Kevin Wolf
  2009-07-27  9:23           ` Jamie Lokier
  2009-07-31 15:04           ` Christoph Hellwig
  0 siblings, 2 replies; 44+ messages in thread
From: Kevin Wolf @ 2009-07-27  8:00 UTC (permalink / raw)
  To: Stefan Weil; +Cc: QEMU Developers, Christoph Hellwig

Stefan Weil schrieb:
>>> +/* Enable (currently) unsupported features (not implemented yet). */
>>> +//~ #define CONFIG_VDI_UNSUPPORTED
>>> +
>>> +/* Support non-standard block (cluster) size. */
>>> +//~ #define CONFIG_VDI_BLOCK_SIZE
>>>     
>> Actually, this is only about support for image creation. Any reason why
>> we shouldn't support creating images with non-standard block sizes? The
>> code already supports opening such images unconditionally, so the only
>> effect of turning it off for image creation is that we can't test that
>> functionality in qemu-iotests.
>>
>> [Oh, sorry, actually there is a check in open which I missed at first.
>> Any reason why we can't support it? But it's consistent at least.]
>>   
> 
> Multiples of 512 (SECTOR_SIZE) might work.
> 
> VirtualBox uses 1 MiB blocks, and I did not see options to create images
> with different block sizes. Maybe they even don't support such images.
> So I did not spend the time to test other block sizes.
> Why implement things nobody needs?

Ok, that makes sense. Probably we should remove the #define completely
then. I mean, why creating images that nobody - not even we ourselves -
can read?

>>   
>>> +/* Support static (pre-allocated) images. */
>>> +#define CONFIG_VDI_STATIC_IMAGE
>>> +
>>> +/* Command line option for static images. */
>>> +#define BLOCK_OPT_STATIC "static"
>>>     
>> What about calling it "preallocate" and moving it to block_int.h? I
>> think this could make sense for other drivers, too.
>>   
> 
> Yes, this would be reasonable if we had more drivers with support
> for "preallocate".
> 
> The VDI documentation calls these images "static", and they prefer
> dynamic images, so this static option is not really very important.

I might consider implementing it for qcow2. Cluster allocation is the
really slow part, so having complete L1/L2 tables in place from the very
beginning could speed up things.

Though I guess that for static images typically not only metadata is
preallocated, but zeros are written for the whole disk content? Maybe we
could implement a three-way flag like preallocate=[no,metadata,data] and
let qemu-img handle the data part (writing zeros is the same for all
formats and would even work with raw).

>>> +static int vdi_make_empty(BlockDriverState *bs)
>>> +{
>>> +    /* TODO: missing code. */
>>> +    logout("\n");
>>> +    return 0;
>>> +}
>>>     
>> If you don't implement it, leave it out. Setting
>> bdrv_vdi.bdrv_make_empty != NULL means that you claim to have that
>> functionality.
>>
>>   
> 
> I did not analyse what *_make_empty is supposed to do.
> This is one of the details were hints of the block driver experts
> would be helpful.

It's used after committing to a backing file. qcow1 seems to be the only
format actually implementing it. It complete clears the L1/L2 tables (=
the block map for VDI) so that all accesses go to the backing file again
and it can shrink the image file.

>>> +
>>> +#if defined(CONFIG_AIO)
>>> +
>>> +#if 0
>>>     
>> I guess you should remove this block before the patch is included.
>>   
> 
> This is also one of the details were hints of the block driver experts
> would be helpful as I did not understand this aio_remove / aio_cancel
> mechanism.

I wouldn't consider myself an AIO expert and I don't want to tell you
something wrong, so maybe Christoph would be the right one here?

>> Does VDI support compression even theoretically?
>>   
> 
> I think it would be possible to extend the specification
> to support compression or encryption.
> 
> The official specification (as far as I know it) does not
> support compression (nor encryption).

Then remove the entry. The function vdi_write_compressed doesn't exist
and doesn't even make sense with the current specification. The same
applies for vdi_set_key.

Kevin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] Add new block driver for the VDI format (use aio)
  2009-07-27  8:00         ` Kevin Wolf
@ 2009-07-27  9:23           ` Jamie Lokier
  2009-07-28  6:37             ` Amit Shah
  2009-07-31 15:04           ` Christoph Hellwig
  1 sibling, 1 reply; 44+ messages in thread
From: Jamie Lokier @ 2009-07-27  9:23 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: QEMU Developers, Christoph Hellwig

Kevin Wolf wrote:
> Though I guess that for static images typically not only metadata is
> preallocated, but zeros are written for the whole disk content? Maybe we
> could implement a three-way flag like preallocate=[no,metadata,data] and
> let qemu-img handle the data part (writing zeros is the same for all
> formats and would even work with raw).

Note that you can also preallocate space with posix_fallocate(), which
fills the file with zeros but (sometimes) doesn't take as long as
writing zeros.

Apparently it is almost essential when writing large files in small
pieces on Windows, and on Linux it is supported by the ext4
filesystem, but I haven't checked either claim.

-- Jamie

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] Add new block driver for the VDI format (use aio)
  2009-07-27  9:23           ` Jamie Lokier
@ 2009-07-28  6:37             ` Amit Shah
  2009-07-28  8:34               ` Jamie Lokier
  0 siblings, 1 reply; 44+ messages in thread
From: Amit Shah @ 2009-07-28  6:37 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Kevin Wolf, QEMU Developers, Christoph Hellwig

On (Mon) Jul 27 2009 [10:23:19], Jamie Lokier wrote:
> Kevin Wolf wrote:
> > Though I guess that for static images typically not only metadata is
> > preallocated, but zeros are written for the whole disk content? Maybe we
> > could implement a three-way flag like preallocate=[no,metadata,data] and
> > let qemu-img handle the data part (writing zeros is the same for all
> > formats and would even work with raw).
> 
> Note that you can also preallocate space with posix_fallocate(), which
> fills the file with zeros but (sometimes) doesn't take as long as
> writing zeros.

It won't take as long as writing zeroes if the filesystem underneath has
support for fallocate(). ext4, btrfs, xfs have support for fallocate().

> Apparently it is almost essential when writing large files in small
> pieces on Windows, and on Linux it is supported by the ext4
> filesystem, but I haven't checked either claim.

I did some comparisons:

http://log.amitshah.net/2009/03/comparison-of-file-systems-and-speeding.html

http://log.amitshah.net/2009/04/re-comparing-file-systems.html

		Amit

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] Add new block driver for the VDI format (use aio)
  2009-07-28  6:37             ` Amit Shah
@ 2009-07-28  8:34               ` Jamie Lokier
  2009-07-28  8:56                 ` Daniel P. Berrange
  0 siblings, 1 reply; 44+ messages in thread
From: Jamie Lokier @ 2009-07-28  8:34 UTC (permalink / raw)
  To: Amit Shah; +Cc: Kevin Wolf, QEMU Developers, Christoph Hellwig

Amit Shah wrote:
> On (Mon) Jul 27 2009 [10:23:19], Jamie Lokier wrote:
> > Kevin Wolf wrote:
> > > Though I guess that for static images typically not only metadata is
> > > preallocated, but zeros are written for the whole disk content? Maybe we
> > > could implement a three-way flag like preallocate=[no,metadata,data] and
> > > let qemu-img handle the data part (writing zeros is the same for all
> > > formats and would even work with raw).
> > 
> > Note that you can also preallocate space with posix_fallocate(), which
> > fills the file with zeros but (sometimes) doesn't take as long as
> > writing zeros.
> 
> It won't take as long as writing zeroes if the filesystem underneath has
> support for fallocate(). ext4, btrfs, xfs have support for fallocate().
> 
> > Apparently it is almost essential when writing large files in small
> > pieces on Windows, and on Linux it is supported by the ext4
> > filesystem, but I haven't checked either claim.
> 
> I did some comparisons:
> 
> http://log.amitshah.net/2009/03/comparison-of-file-systems-and-speeding.html
> 
> http://log.amitshah.net/2009/04/re-comparing-file-systems.html

There was some discussion of it on the rsync list, which is where I
learned it is important for NTFS performance on Windows.

By the way, why is fallocate() support being added to libvirt to
improve disk image creation, instead of to qemu-img?

-- Jamie

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] Add new block driver for the VDI format (use aio)
  2009-07-28  8:34               ` Jamie Lokier
@ 2009-07-28  8:56                 ` Daniel P. Berrange
  2009-07-28  9:03                   ` Jamie Lokier
  0 siblings, 1 reply; 44+ messages in thread
From: Daniel P. Berrange @ 2009-07-28  8:56 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Amit Shah, Kevin Wolf, QEMU Developers, Christoph Hellwig

On Tue, Jul 28, 2009 at 09:34:52AM +0100, Jamie Lokier wrote:
> Amit Shah wrote:
> > On (Mon) Jul 27 2009 [10:23:19], Jamie Lokier wrote:
> > > Kevin Wolf wrote:
> > > > Though I guess that for static images typically not only metadata is
> > > > preallocated, but zeros are written for the whole disk content? Maybe we
> > > > could implement a three-way flag like preallocate=[no,metadata,data] and
> > > > let qemu-img handle the data part (writing zeros is the same for all
> > > > formats and would even work with raw).
> > > 
> > > Note that you can also preallocate space with posix_fallocate(), which
> > > fills the file with zeros but (sometimes) doesn't take as long as
> > > writing zeros.
> > 
> > It won't take as long as writing zeroes if the filesystem underneath has
> > support for fallocate(). ext4, btrfs, xfs have support for fallocate().
> > 
> > > Apparently it is almost essential when writing large files in small
> > > pieces on Windows, and on Linux it is supported by the ext4
> > > filesystem, but I haven't checked either claim.
> > 
> > I did some comparisons:
> > 
> > http://log.amitshah.net/2009/03/comparison-of-file-systems-and-speeding.html
> > 
> > http://log.amitshah.net/2009/04/re-comparing-file-systems.html
> 
> There was some discussion of it on the rsync list, which is where I
> learned it is important for NTFS performance on Windows.
> 
> By the way, why is fallocate() support being added to libvirt to
> improve disk image creation, instead of to qemu-img?

libvirt has to work with more than just QEMU.  Thus it has a built in
support for creating raw files, and this is where we added fallocate
support. It also has ability to call out to hypervisor specific tools
for creating non-raw formats. We support qemu-img, and qcow-create (the
latter from Xen)

Regards,
Daniel
-- 
|: Red Hat, Engineering, London   -o-   http://people.redhat.com/berrange/ :|
|: http://libvirt.org  -o-  http://virt-manager.org  -o-  http://ovirt.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: GnuPG: 7D3B9505  -o-  F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :|

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] Add new block driver for the VDI format (use aio)
  2009-07-28  8:56                 ` Daniel P. Berrange
@ 2009-07-28  9:03                   ` Jamie Lokier
  2009-07-28  9:11                     ` Kevin Wolf
  0 siblings, 1 reply; 44+ messages in thread
From: Jamie Lokier @ 2009-07-28  9:03 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Amit Shah, Kevin Wolf, QEMU Developers, Christoph Hellwig

Daniel P. Berrange wrote:
> > By the way, why is fallocate() support being added to libvirt to
> > improve disk image creation, instead of to qemu-img?
> 
> libvirt has to work with more than just QEMU.  Thus it has a built in
> support for creating raw files, and this is where we added fallocate
> support. It also has ability to call out to hypervisor specific tools
> for creating non-raw formats. We support qemu-img, and qcow-create (the
> latter from Xen)

Ok.  It sounds like fallocate() support would be useful for qemu-img
too (even if libvirt is used to manage it, if it didn't produce it).

-- Jamie

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] Add new block driver for the VDI format (use aio)
  2009-07-28  9:03                   ` Jamie Lokier
@ 2009-07-28  9:11                     ` Kevin Wolf
  0 siblings, 0 replies; 44+ messages in thread
From: Kevin Wolf @ 2009-07-28  9:11 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: Amit Shah, QEMU Developers, Christoph Hellwig

Jamie Lokier schrieb:
> Daniel P. Berrange wrote:
>>> By the way, why is fallocate() support being added to libvirt to
>>> improve disk image creation, instead of to qemu-img?
>> libvirt has to work with more than just QEMU.  Thus it has a built in
>> support for creating raw files, and this is where we added fallocate
>> support. It also has ability to call out to hypervisor specific tools
>> for creating non-raw formats. We support qemu-img, and qcow-create (the
>> latter from Xen)
> 
> Ok.  It sounds like fallocate() support would be useful for qemu-img
> too (even if libvirt is used to manage it, if it didn't produce it).

I agree, fallocate seems to be the right way to implement my suggestion
of a generic preallocate option in qemu-img. It's on my list, so I'm
going to work on it when I come back from vacation in two weeks.

Kevin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] add support for new option of vdi format
  2009-07-23 20:30   ` [Qemu-devel] [PATCH] add support for new option of vdi format Stefan Weil
  2009-07-23 20:34     ` [Qemu-devel] " Stefan Weil
@ 2009-07-31 14:59     ` Christoph Hellwig
  2009-08-13 16:53     ` Christoph Hellwig
  2 siblings, 0 replies; 44+ messages in thread
From: Christoph Hellwig @ 2009-07-31 14:59 UTC (permalink / raw)
  To: Stefan Weil; +Cc: Christoph Hellwig, QEMU Developers

On Thu, Jul 23, 2009 at 10:30:45PM +0200, Stefan Weil wrote:
> VDI supports an image option 'static'.
> Ignore "static=off" from qemu-img output.

I'll wait with applying this until we have agreement on the final name
for this option.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] Add new block driver for the VDI format (use aio)
  2009-07-27  8:00         ` Kevin Wolf
  2009-07-27  9:23           ` Jamie Lokier
@ 2009-07-31 15:04           ` Christoph Hellwig
  2009-07-31 19:53             ` Stefan Weil
  1 sibling, 1 reply; 44+ messages in thread
From: Christoph Hellwig @ 2009-07-31 15:04 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: QEMU Developers, Christoph Hellwig

On Mon, Jul 27, 2009 at 10:00:34AM +0200, Kevin Wolf wrote:
> Ok, that makes sense. Probably we should remove the #define completely
> then. I mean, why creating images that nobody - not even we ourselves -
> can read?

I agree.  As mentioned during the previous rounds all these ifdef parts
of code that can only be compiled in/out by touching the source code are
really bad.  Either they are good enough to be enabled unconditionally
(or at least through configure if they require a library or similar) or
they are broken / useless enough to not bother.  If virtualbox only
supports 1k block size images and we do aswell there's no point in
carrying around this dead code.

> >> I guess you should remove this block before the patch is included.
> >>   
> > 
> > This is also one of the details were hints of the block driver experts
> > would be helpful as I did not understand this aio_remove / aio_cancel
> > mechanism.
> 
> I wouldn't consider myself an AIO expert and I don't want to tell you
> something wrong, so maybe Christoph would be the right one here?

#if 0 is a horrible way for hints.  Coments with XXX: or TODO: are much
better documentation.  I'll take a look at the aio implementation, but
I'm far from expert on the qemu aio code.

> > I think it would be possible to extend the specification
> > to support compression or encryption.
> > 
> > The official specification (as far as I know it) does not
> > support compression (nor encryption).
> 
> Then remove the entry. The function vdi_write_compressed doesn't exist
> and doesn't even make sense with the current specification. The same
> applies for vdi_set_key.

Seconded, keeping function stubs around just bloats and obsfucates the
code without reason.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] Add new block driver for the VDI format (use aio)
  2009-07-23 20:29   ` [Qemu-devel] [PATCH] Add new block driver for the VDI format (use aio) Stefan Weil
  2009-07-24  9:18     ` Kevin Wolf
@ 2009-07-31 15:25     ` Anthony Liguori
  2009-07-31 18:27       ` Stefan Weil
  1 sibling, 1 reply; 44+ messages in thread
From: Anthony Liguori @ 2009-07-31 15:25 UTC (permalink / raw)
  To: Stefan Weil; +Cc: QEMU Developers

Stefan Weil wrote:
> This is a new block driver written from scratch
> to support the VDI format in QEMU.
>
> VDI is the native format used by Innotek / SUN VirtualBox.
>
> Signed-off-by: Stefan Weil <weil@mail.berlios.de>
> ---
>  Makefile      |    2 +-
>  block/vdi.c   | 1105 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  qemu-img.texi |    2 +
>  3 files changed, 1108 insertions(+), 1 deletions(-)
>  create mode 100644 block/vdi.c
>
> diff --git a/Makefile b/Makefile
> index d8fa730..29f4a65 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -66,7 +66,7 @@ recurse-all: $(SUBDIR_RULES)
>  block-obj-y = cutils.o cache-utils.o qemu-malloc.o qemu-option.o module.o
>  block-obj-y += nbd.o block.o aio.o aes.o
>  
> -block-nested-y += cow.o qcow.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
> +block-nested-y += cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
>  block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
>  block-nested-y += parallels.o nbd.o
>  
> diff --git a/block/vdi.c b/block/vdi.c
> new file mode 100644
> index 0000000..0432446
> --- /dev/null
> +++ b/block/vdi.c
> @@ -0,0 +1,1105 @@
> +/*
> + * Block driver for the Virtual Disk Image (VDI) format
> + *
> + * Copyright (c) 2009 Stefan Weil
> + *
> + * This program is free software: you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation, either version 2 of the License, or
> + * (at your option) version 3 or any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program.  If not, see <http://www.gnu.org/licenses/>.
> + *
> + * Reference:
> + * http://forums.virtualbox.org/viewtopic.php?t=8046
> + *
> + * This driver supports create / read / write operations on VDI images.
> + *
> + * Todo (see also TODO in code):
> + *
> + * Some features like snapshots are still missing.
> + *
> + * Deallocation of zero-filled blocks and shrinking images are missing, too
> + * (might be added to common block layer).
> + *
> + * Allocation of blocks could be optimized (less writes to block map and
> + * header).
> + *
> + * Read and write of adjacents blocks could be done in one operation
> + * (current code uses one operation per block (1 MiB).
> + *
> + * The code is not thread safe (missing locks for changes in header and
> + * block table, no problem with current QEMU).
> + *
> + * Hints:
> + *
> + * Blocks (VDI documentation) correspond to clusters (QEMU).
> + * QEMU's backing files could be implemented using VDI snapshot files (TODO).
> + * VDI snapshot files may also contain the complete machine state.
> + * Maybe this machine state can be converted to QEMU PC machine snapshot data.
> + *
> + * The driver keeps a block cache (little endian entries) in memory.
> + * For the standard block size (1 MiB), a terrabyte disk will use 4 MiB RAM,
> + * so this seems to be reasonable.
> + */
> +
> +#include "qemu-common.h"
> +#include "block_int.h"
> +#include "module.h"
> +
> +#if defined(HAVE_UUID_H)
> +#include <uuid/uuid.h>
> +#else
> +/* TODO: move uuid emulation to some central place in QEMU. */
> +#include "sysemu.h"     /* UUID_FMT */
> +typedef unsigned char uuid_t[16];
> +void uuid_generate(uuid_t out);
> +void uuid_unparse(uuid_t uu, char *out);
> +#endif
> +
> +/* Code configuration options. */
> +
> +/* Use old (synchronous) I/O. */
> +//~ #undef CONFIG_AIO
>   

Please eliminate this define.  It just will lead to bitrot.


> +/* Enable debug messages. */
> +//~ #define CONFIG_VDI_DEBUG
> +
> +/* Support write operations on VDI images. */
> +#define CONFIG_VDI_WRITE
> +
> +/* Support snapshot images (not implemented yet). */
> +//~ #define CONFIG_VDI_SNAPSHOT
> +
> +/* Enable (currently) unsupported features (not implemented yet). */
> +//~ #define CONFIG_VDI_UNSUPPORTED
> +
> +/* Support non-standard block (cluster) size. */
> +//~ #define CONFIG_VDI_BLOCK_SIZE
> +
> +/* Support static (pre-allocated) images. */
> +#define CONFIG_VDI_STATIC_IMAGE
>   

Same thing for the rest of these.

> +/* Command line option for static images. */
> +#define BLOCK_OPT_STATIC "static"
> +
> +#define KiB     1024
> +#define MiB     (KiB * KiB)
> +
> +#define SECTOR_SIZE 512
> +
> +#if defined(CONFIG_VDI_DEBUG)
> +#define logout(fmt, ...) \
> +                fprintf(stderr, "vdi\t%-24s" fmt, __func__, ##__VA_ARGS__)
> +#else
> +#define logout(fmt, ...) ((void)0)
> +#endif
>   

do { } while (0) is better for these sort of things.

> +/* Image signature. */
> +#define VDI_SIGNATURE 0xbeda107f
> +
> +/* Image version. */
> +#define VDI_VERSION_1_1 0x00010001
> +
> +/* Image type. */
> +#define VDI_TYPE_DYNAMIC 1
> +#define VDI_TYPE_STATIC  2
> +
> +/* Innotek / SUN images use these strings in header.text:
> + * "<<< innotek VirtualBox Disk Image >>>\n"
> + * "<<< Sun xVM VirtualBox Disk Image >>>\n"
> + * "<<< Sun VirtualBox Disk Image >>>\n"
> + * The value does not matter, so QEMU created images use a different text.
> + */
> +#define VDI_TEXT "<<< QEMU VM Virtual Disk Image >>>\n"
>   

a static const char * is a bit nicer for this.

> +/* Unallocated blocks use this index (no need to convert endianess). */
> +#define VDI_UNALLOCATED UINT32_MAX
> +
> +#if !defined(HAVE_UUID_H)
> +void uuid_generate(uuid_t out)
> +{
> +    memset(out, 0, sizeof(out));
> +}
> +
> +void uuid_unparse(uuid_t uu, char *out)
> +{
> +    snprintf(out, 37, UUID_FMT,
> +            uu[0], uu[1], uu[2], uu[3], uu[4], uu[5], uu[6], uu[7],
> +            uu[8], uu[9], uu[10], uu[11], uu[12], uu[13], uu[14], uu[15]);
> +}
> +#endif
>   

Generating a 0 uuid seems odd to me.  Wouldn't it be better to depend 
unconditionally on libuuid?

> +static int vdi_make_empty(BlockDriverState *bs)
> +{
> +    /* TODO: missing code. */
> +    logout("\n");
> +    return 0;
> +}
>   

I'm not a big fan of stubs like this.

> +#if defined(CONFIG_AIO)
> +
> +#if 0
> +static void vdi_aio_remove(VdiAIOCB *acb)
> +{
> +    logout("\n");
> +#if 0
> +    VdiAIOCB **pacb;
> +
> +    /* remove the callback from the queue */
> +    pacb = &posix_aio_state->first_aio;
> +    for(;;) {
> +        if (*pacb == NULL) {
> +            fprintf(stderr, "vdi_aio_remove: aio request not found!\n");
> +            break;
> +        } else if (*pacb == acb) {
> +            *pacb = acb->next;
> +            qemu_aio_release(acb);
> +            break;
> +        }
> +        pacb = &(*pacb)->next;
> +    }
> +#endif
> +}
> +#endif
> +
> +static void vdi_aio_cancel(BlockDriverAIOCB *blockacb)
> +{
> +    logout("\n");
> +
> +#if 0
> +    int ret;
> +    VdiAIOCB *acb = (VdiAIOCB *)blockacb;
> +
> +    ret = qemu_paio_cancel(acb->aiocb.aio_fildes, &acb->aiocb);
> +    if (ret == QEMU_PAIO_NOTCANCELED) {
> +        /* fail safe: if the aio could not be canceled, we wait for
> +           it */
> +        while (qemu_paio_error(&acb->aiocb) == EINPROGRESS);
> +    }
> +
> +    vdi_aio_remove(acb);
> +#endif
> +}
>   

These really should not be #if 0'd.  Is there a bug here?

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] Add new block driver for the VDI format (use aio)
  2009-07-31 15:25     ` Anthony Liguori
@ 2009-07-31 18:27       ` Stefan Weil
  2009-07-31 19:45         ` [Qemu-devel] [PATCH] Add new block driver for the VDI format (only aio supported) Stefan Weil
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Weil @ 2009-07-31 18:27 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: QEMU Developers

Anthony Liguori schrieb:
> Stefan Weil wrote:
>> This is a new block driver written from scratch
>> to support the VDI format in QEMU.
>>
>> VDI is the native format used by Innotek / SUN VirtualBox.
>>
>> Signed-off-by: Stefan Weil <weil@mail.berlios.de>
>> ---
>>  Makefile      |    2 +-
>>  block/vdi.c   | 1105
>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>  qemu-img.texi |    2 +
>>  3 files changed, 1108 insertions(+), 1 deletions(-)
>>  create mode 100644 block/vdi.c
>>
>> diff --git a/Makefile b/Makefile
>> index d8fa730..29f4a65 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -66,7 +66,7 @@ recurse-all: $(SUBDIR_RULES)
>>  block-obj-y = cutils.o cache-utils.o qemu-malloc.o qemu-option.o
>> module.o
>>  block-obj-y += nbd.o block.o aio.o aes.o
>>  
>> -block-nested-y += cow.o qcow.o vmdk.o cloop.o dmg.o bochs.o vpc.o
>> vvfat.o
>> +block-nested-y += cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o
>> vpc.o vvfat.o
>>  block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o
>> qcow2-snapshot.o
>>  block-nested-y += parallels.o nbd.o
>>  
>> diff --git a/block/vdi.c b/block/vdi.c
>> new file mode 100644
>> index 0000000..0432446
>> --- /dev/null
>> +++ b/block/vdi.c
>> @@ -0,0 +1,1105 @@
>> +/*
>> + * Block driver for the Virtual Disk Image (VDI) format
>> + *
>> + * Copyright (c) 2009 Stefan Weil
>> + *
>> + * This program is free software: you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation, either version 2 of the License, or
>> + * (at your option) version 3 or any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see
>> <http://www.gnu.org/licenses/>.
>> + *
>> + * Reference:
>> + * http://forums.virtualbox.org/viewtopic.php?t=8046
>> + *
>> + * This driver supports create / read / write operations on VDI images.
>> + *
>> + * Todo (see also TODO in code):
>> + *
>> + * Some features like snapshots are still missing.
>> + *
>> + * Deallocation of zero-filled blocks and shrinking images are
>> missing, too
>> + * (might be added to common block layer).
>> + *
>> + * Allocation of blocks could be optimized (less writes to block map
>> and
>> + * header).
>> + *
>> + * Read and write of adjacents blocks could be done in one operation
>> + * (current code uses one operation per block (1 MiB).
>> + *
>> + * The code is not thread safe (missing locks for changes in header and
>> + * block table, no problem with current QEMU).
>> + *
>> + * Hints:
>> + *
>> + * Blocks (VDI documentation) correspond to clusters (QEMU).
>> + * QEMU's backing files could be implemented using VDI snapshot
>> files (TODO).
>> + * VDI snapshot files may also contain the complete machine state.
>> + * Maybe this machine state can be converted to QEMU PC machine
>> snapshot data.
>> + *
>> + * The driver keeps a block cache (little endian entries) in memory.
>> + * For the standard block size (1 MiB), a terrabyte disk will use 4
>> MiB RAM,
>> + * so this seems to be reasonable.
>> + */
>> +
>> +#include "qemu-common.h"
>> +#include "block_int.h"
>> +#include "module.h"
>> +
>> +#if defined(HAVE_UUID_H)
>> +#include <uuid/uuid.h>
>> +#else
>> +/* TODO: move uuid emulation to some central place in QEMU. */
>> +#include "sysemu.h"     /* UUID_FMT */
>> +typedef unsigned char uuid_t[16];
>> +void uuid_generate(uuid_t out);
>> +void uuid_unparse(uuid_t uu, char *out);
>> +#endif
>> +
>> +/* Code configuration options. */
>> +
>> +/* Use old (synchronous) I/O. */
>> +//~ #undef CONFIG_AIO
>>   
>
> Please eliminate this define.  It just will lead to bitrot.

The latest patch (which I will send to the list soon) only contains the
aio code.

>
>
>> +/* Enable debug messages. */
>> +//~ #define CONFIG_VDI_DEBUG
>> +
>> +/* Support write operations on VDI images. */
>> +#define CONFIG_VDI_WRITE
>> +
>> +/* Support snapshot images (not implemented yet). */
>> +//~ #define CONFIG_VDI_SNAPSHOT
>> +
>> +/* Enable (currently) unsupported features (not implemented yet). */
>> +//~ #define CONFIG_VDI_UNSUPPORTED
>> +
>> +/* Support non-standard block (cluster) size. */
>> +//~ #define CONFIG_VDI_BLOCK_SIZE
>> +
>> +/* Support static (pre-allocated) images. */
>> +#define CONFIG_VDI_STATIC_IMAGE
>>   
>
> Same thing for the rest of these.

No static / fixed / pre-allocated images? This option should stay
in the code (maybe with the name changed).

Nor would I like to remove the block size option, as there might arise
a need for larger block sizes when the image size grows above
some tera bytes.

I see no reason why removing CONFIG_VDI_DEBUG might help.

The rest will be removed in the next patch.


>
>> +/* Command line option for static images. */
>> +#define BLOCK_OPT_STATIC "static"
>> +
>> +#define KiB     1024
>> +#define MiB     (KiB * KiB)
>> +
>> +#define SECTOR_SIZE 512
>> +
>> +#if defined(CONFIG_VDI_DEBUG)
>> +#define logout(fmt, ...) \
>> +                fprintf(stderr, "vdi\t%-24s" fmt, __func__,
>> ##__VA_ARGS__)
>> +#else
>> +#define logout(fmt, ...) ((void)0)
>> +#endif
>>   
>
> do { } while (0) is better for these sort of things.

If you have more than one statement in a define, do .. while is even
essential.

It does not matter if you have just one statement (like in this code),
and for empty statements, (void)0 is also very common.

>
>> +/* Image signature. */
>> +#define VDI_SIGNATURE 0xbeda107f
>> +
>> +/* Image version. */
>> +#define VDI_VERSION_1_1 0x00010001
>> +
>> +/* Image type. */
>> +#define VDI_TYPE_DYNAMIC 1
>> +#define VDI_TYPE_STATIC  2
>> +
>> +/* Innotek / SUN images use these strings in header.text:
>> + * "<<< innotek VirtualBox Disk Image >>>\n"
>> + * "<<< Sun xVM VirtualBox Disk Image >>>\n"
>> + * "<<< Sun VirtualBox Disk Image >>>\n"
>> + * The value does not matter, so QEMU created images use a different
>> text.
>> + */
>> +#define VDI_TEXT "<<< QEMU VM Virtual Disk Image >>>\n"
>>   
>
> a static const char * is a bit nicer for this.

Maybe. static const char [] even saves memory and code
(compared with static const char *) and should give
the same result as the define.

>
>> +/* Unallocated blocks use this index (no need to convert endianess). */
>> +#define VDI_UNALLOCATED UINT32_MAX
>> +
>> +#if !defined(HAVE_UUID_H)
>> +void uuid_generate(uuid_t out)
>> +{
>> +    memset(out, 0, sizeof(out));
>> +}
>> +
>> +void uuid_unparse(uuid_t uu, char *out)
>> +{
>> +    snprintf(out, 37, UUID_FMT,
>> +            uu[0], uu[1], uu[2], uu[3], uu[4], uu[5], uu[6], uu[7],
>> +            uu[8], uu[9], uu[10], uu[11], uu[12], uu[13], uu[14],
>> uu[15]);
>> +}
>> +#endif
>>   
>
> Generating a 0 uuid seems odd to me.  Wouldn't it be better to depend
> unconditionally on libuuid?

Do you think so? It might be difficult for some users to get libuuid,
especially on windows. For those users, a "uuid" of zero
won't matter when they use QEMU (the current driver only sets
uuid values during image creation and does not use them overwise).

It is possible to change the uuid values using tools supplied with
VirtualBox, so VirtualBox users also can handle those zero uuid
values.

As soon as more parts of QEMU will use libuuid code, it would be
reasonable to replace this code by code which generates real
uuid values (and move it to some other place).

>
>> +static int vdi_make_empty(BlockDriverState *bs)
>> +{
>> +    /* TODO: missing code. */
>> +    logout("\n");
>> +    return 0;
>> +}
>>   
>
> I'm not a big fan of stubs like this.

Nor am I. It's a copy from the qcow2 driver which is also a stub.
I know this is a bad excuse, but as I said before, I did not
fully understand all aspects of the block driver interface
(there is not too much documentation on it).

>
>> +#if defined(CONFIG_AIO)
>> +
>> +#if 0
>> +static void vdi_aio_remove(VdiAIOCB *acb)
>> +{
>> +    logout("\n");
>> +#if 0
>> +    VdiAIOCB **pacb;
>> +
>> +    /* remove the callback from the queue */
>> +    pacb = &posix_aio_state->first_aio;
>> +    for(;;) {
>> +        if (*pacb == NULL) {
>> +            fprintf(stderr, "vdi_aio_remove: aio request not
>> found!\n");
>> +            break;
>> +        } else if (*pacb == acb) {
>> +            *pacb = acb->next;
>> +            qemu_aio_release(acb);
>> +            break;
>> +        }
>> +        pacb = &(*pacb)->next;
>> +    }
>> +#endif
>> +}
>> +#endif
>> +
>> +static void vdi_aio_cancel(BlockDriverAIOCB *blockacb)
>> +{
>> +    logout("\n");
>> +
>> +#if 0
>> +    int ret;
>> +    VdiAIOCB *acb = (VdiAIOCB *)blockacb;
>> +
>> +    ret = qemu_paio_cancel(acb->aiocb.aio_fildes, &acb->aiocb);
>> +    if (ret == QEMU_PAIO_NOTCANCELED) {
>> +        /* fail safe: if the aio could not be canceled, we wait for
>> +           it */
>> +        while (qemu_paio_error(&acb->aiocb) == EINPROGRESS);
>> +    }
>> +
>> +    vdi_aio_remove(acb);
>> +#endif
>> +}
>>   
>
> These really should not be #if 0'd.  Is there a bug here?

I have replaced this part of the code now with a code copy
from the qcow2 driver, so the latest patch no longer uses
this #if 0 code.

>
> Regards,
>
> Anthony Liguori
>


Thanks for the review.

Regards
Stefan Weil

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH] Add new block driver for the VDI format (only aio supported)
  2009-07-31 18:27       ` Stefan Weil
@ 2009-07-31 19:45         ` Stefan Weil
  0 siblings, 0 replies; 44+ messages in thread
From: Stefan Weil @ 2009-07-31 19:45 UTC (permalink / raw)
  To: Anthony Liguori, QEMU Developers

This is a new block driver written from scratch
to support the VDI format in QEMU.

VDI is the native format used by Innotek / SUN VirtualBox.

Latest changes:

* stripped down version
  (code for synchronous operations and experimental code removed)

* don't open VDI snapshot images (with uuid_link or uuid_parent)

* modified vdi_aio_cancel

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
---
 Makefile      |    2 +-
 block/vdi.c   |  951 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 qemu-img.texi |    2 +
 3 files changed, 954 insertions(+), 1 deletions(-)
 create mode 100644 block/vdi.c

diff --git a/Makefile b/Makefile
index 382405a..3fed593 100644
--- a/Makefile
+++ b/Makefile
@@ -72,7 +72,7 @@ block-obj-y = cutils.o cache-utils.o qemu-malloc.o qemu-option.o module.o
 block-obj-y += nbd.o block.o aio.o aes.o
 block-obj-$(CONFIG_AIO) += posix-aio-compat.o
 
-block-nested-y += cow.o qcow.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
+block-nested-y += cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
 block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
 block-nested-y += parallels.o nbd.o
 block-nested-$(CONFIG_WIN32) += raw-win32.o
diff --git a/block/vdi.c b/block/vdi.c
new file mode 100644
index 0000000..db3fe16
--- /dev/null
+++ b/block/vdi.c
@@ -0,0 +1,951 @@
+/*
+ * Block driver for the Virtual Disk Image (VDI) format
+ *
+ * Copyright (c) 2009 Stefan Weil
+ *
+ * This program is free software: you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation, either version 2 of the License, or
+ * (at your option) version 3 or any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program.  If not, see <http://www.gnu.org/licenses/>.
+ *
+ * Reference:
+ * http://forums.virtualbox.org/viewtopic.php?t=8046
+ *
+ * This driver supports create / read / write operations on VDI images.
+ *
+ * Todo (see also TODO in code):
+ *
+ * Some features like snapshots are still missing.
+ *
+ * Deallocation of zero-filled blocks and shrinking images are missing, too
+ * (might be added to common block layer).
+ *
+ * Allocation of blocks could be optimized (less writes to block map and
+ * header).
+ *
+ * Read and write of adjacents blocks could be done in one operation
+ * (current code uses one operation per block (1 MiB).
+ *
+ * The code is not thread safe (missing locks for changes in header and
+ * block table, no problem with current QEMU).
+ *
+ * Hints:
+ *
+ * Blocks (VDI documentation) correspond to clusters (QEMU).
+ * QEMU's backing files could be implemented using VDI snapshot files (TODO).
+ * VDI snapshot files may also contain the complete machine state.
+ * Maybe this machine state can be converted to QEMU PC machine snapshot data.
+ *
+ * The driver keeps a block cache (little endian entries) in memory.
+ * For the standard block size (1 MiB), a 1 TiB disk will use 4 MiB RAM,
+ * so this seems to be reasonable.
+ */
+
+#include "qemu-common.h"
+#include "block_int.h"
+#include "module.h"
+
+#if defined(HAVE_UUID_H)
+#include <uuid/uuid.h>
+#else
+/* TODO: move uuid emulation to some central place in QEMU. */
+#include "sysemu.h"     /* UUID_FMT */
+typedef unsigned char uuid_t[16];
+void uuid_generate(uuid_t out);
+int uuid_is_null(const uuid_t uu);
+void uuid_unparse(const uuid_t uu, char *out);
+#endif
+
+/* Code configuration options. */
+
+/* Enable debug messages. */
+//~ #define CONFIG_VDI_DEBUG
+
+/* Support write operations on VDI images. */
+#define CONFIG_VDI_WRITE
+
+/* Support non-standard block (cluster) size. This is untested.
+ * Maybe it will be needed for very large images.
+ */
+//~ #define CONFIG_VDI_BLOCK_SIZE
+
+/* Support static (fixed, pre-allocated) images. */
+#define CONFIG_VDI_STATIC_IMAGE
+
+/* Command line option for static images. */
+#define BLOCK_OPT_STATIC "static"
+
+#define KiB     1024
+#define MiB     (KiB * KiB)
+
+#define SECTOR_SIZE 512
+
+#if defined(CONFIG_VDI_DEBUG)
+#define logout(fmt, ...) \
+                fprintf(stderr, "vdi\t%-24s" fmt, __func__, ##__VA_ARGS__)
+#else
+#define logout(fmt, ...) ((void)0)
+#endif
+
+/* Image signature. */
+#define VDI_SIGNATURE 0xbeda107f
+
+/* Image version. */
+#define VDI_VERSION_1_1 0x00010001
+
+/* Image type. */
+#define VDI_TYPE_DYNAMIC 1
+#define VDI_TYPE_STATIC  2
+
+/* Innotek / SUN images use these strings in header.text:
+ * "<<< innotek VirtualBox Disk Image >>>\n"
+ * "<<< Sun xVM VirtualBox Disk Image >>>\n"
+ * "<<< Sun VirtualBox Disk Image >>>\n"
+ * The value does not matter, so QEMU created images use a different text.
+ */
+#define VDI_TEXT "<<< QEMU VM Virtual Disk Image >>>\n"
+
+/* Unallocated blocks use this index (no need to convert endianess). */
+#define VDI_UNALLOCATED UINT32_MAX
+
+#if !defined(HAVE_UUID_H)
+void uuid_generate(uuid_t out)
+{
+    memset(out, 0, sizeof(out));
+}
+
+int uuid_is_null(const uuid_t uu)
+{
+    uuid_t null_uuid = { 0 };
+    return memcmp(uu, null_uuid, sizeof(uu)) == 0;
+}
+
+void uuid_unparse(const uuid_t uu, char *out)
+{
+    snprintf(out, 37, UUID_FMT,
+            uu[0], uu[1], uu[2], uu[3], uu[4], uu[5], uu[6], uu[7],
+            uu[8], uu[9], uu[10], uu[11], uu[12], uu[13], uu[14], uu[15]);
+}
+#endif
+
+typedef struct {
+    BlockDriverAIOCB common;
+    int64_t sector_num;
+    QEMUIOVector *qiov;
+    uint8_t *buf;
+    /* Total number of sectors. */
+    int nb_sectors;
+    /* Number of sectors for current AIO. */
+    int n_sectors;
+    /* New allocated block map entry. */
+    uint32_t bmap_first;
+    uint32_t bmap_last;
+    /* Buffer for new allocated block. */
+    void *block_buffer;
+    void *orig_buf;
+    int header_modified;
+    BlockDriverAIOCB *hd_aiocb;
+    struct iovec hd_iov;
+    QEMUIOVector hd_qiov;
+    QEMUBH *bh;
+} VdiAIOCB;
+
+typedef struct {
+    char text[0x40];
+    uint32_t signature;
+    uint32_t version;
+    uint32_t header_size;
+    uint32_t image_type;
+    uint32_t image_flags;
+    char description[256];
+    uint32_t offset_bmap;
+    uint32_t offset_data;
+    uint32_t cylinders;         /* disk geometry, unused here */
+    uint32_t heads;             /* disk geometry, unused here */
+    uint32_t sectors;           /* disk geometry, unused here */
+    uint32_t sector_size;
+    uint32_t unused1;
+    uint64_t disk_size;
+    uint32_t block_size;
+    uint32_t block_extra;       /* unused here */
+    uint32_t blocks_in_image;
+    uint32_t blocks_allocated;
+    uuid_t uuid_image;
+    uuid_t uuid_last_snap;
+    uuid_t uuid_link;
+    uuid_t uuid_parent;
+    uint64_t unused2[7];
+} VdiHeader;
+
+typedef struct {
+    BlockDriverState *hd;
+    /* The block map entries are little endian (even in memory). */
+    uint32_t *bmap;
+    /* Size of block (bytes). */
+    uint32_t block_size;
+    /* Size of block (sectors). */
+    uint32_t block_sectors;
+    /* First sector of block map. */
+    uint32_t bmap_sector;
+    /* VDI header (converted to host endianess). */
+    VdiHeader header;
+} BDRVVdiState;
+
+/* Change UUID from little endian (IPRT = VirtualBox format) to big endian
+ * format (network byte order, standard, see RFC 4122) and vice versa.
+ */
+static void uuid_convert(uuid_t uuid)
+{
+    bswap32s((uint32_t *)&uuid[0]);
+    bswap16s((uint16_t *)&uuid[4]);
+    bswap16s((uint16_t *)&uuid[6]);
+}
+
+static void vdi_header_to_cpu(VdiHeader *header)
+{
+    le32_to_cpus(&header->signature);
+    le32_to_cpus(&header->version);
+    le32_to_cpus(&header->header_size);
+    le32_to_cpus(&header->image_type);
+    le32_to_cpus(&header->image_flags);
+    le32_to_cpus(&header->offset_bmap);
+    le32_to_cpus(&header->offset_data);
+    le32_to_cpus(&header->cylinders);
+    le32_to_cpus(&header->heads);
+    le32_to_cpus(&header->sectors);
+    le32_to_cpus(&header->sector_size);
+    le64_to_cpus(&header->disk_size);
+    le32_to_cpus(&header->block_size);
+    le32_to_cpus(&header->block_extra);
+    le32_to_cpus(&header->blocks_in_image);
+    le32_to_cpus(&header->blocks_allocated);
+    uuid_convert(header->uuid_image);
+    uuid_convert(header->uuid_last_snap);
+    uuid_convert(header->uuid_link);
+    uuid_convert(header->uuid_parent);
+}
+
+static void vdi_header_to_le(VdiHeader *header)
+{
+    cpu_to_le32s(&header->signature);
+    cpu_to_le32s(&header->version);
+    cpu_to_le32s(&header->header_size);
+    cpu_to_le32s(&header->image_type);
+    cpu_to_le32s(&header->image_flags);
+    cpu_to_le32s(&header->offset_bmap);
+    cpu_to_le32s(&header->offset_data);
+    cpu_to_le32s(&header->cylinders);
+    cpu_to_le32s(&header->heads);
+    cpu_to_le32s(&header->sectors);
+    cpu_to_le32s(&header->sector_size);
+    cpu_to_le64s(&header->disk_size);
+    cpu_to_le32s(&header->block_size);
+    cpu_to_le32s(&header->block_extra);
+    cpu_to_le32s(&header->blocks_in_image);
+    cpu_to_le32s(&header->blocks_allocated);
+    cpu_to_le32s(&header->blocks_allocated);
+    uuid_convert(header->uuid_image);
+    uuid_convert(header->uuid_last_snap);
+    uuid_convert(header->uuid_link);
+    uuid_convert(header->uuid_parent);
+}
+
+#if defined(CONFIG_VDI_DEBUG)
+static void vdi_header_print(VdiHeader *header)
+{
+    char uuid[37];
+    logout("text        %s", header->text);
+    logout("signature   0x%04x\n", header->signature);
+    logout("header size 0x%04x\n", header->header_size);
+    logout("image type  0x%04x\n", header->image_type);
+    logout("image flags 0x%04x\n", header->image_flags);
+    logout("description %s\n", header->description);
+    logout("offset bmap 0x%04x\n", header->offset_bmap);
+    logout("offset data 0x%04x\n", header->offset_data);
+    logout("cylinders   0x%04x\n", header->cylinders);
+    logout("heads       0x%04x\n", header->heads);
+    logout("sectors     0x%04x\n", header->sectors);
+    logout("sector size 0x%04x\n", header->sector_size);
+    logout("image size  0x%" PRIx64 " B (%" PRIu64 " MiB)\n",
+           header->disk_size, header->disk_size / MiB);
+    logout("block size  0x%04x\n", header->block_size);
+    logout("block extra 0x%04x\n", header->block_extra);
+    logout("blocks tot. 0x%04x\n", header->blocks_in_image);
+    logout("blocks all. 0x%04x\n", header->blocks_allocated);
+    uuid_unparse(header->uuid_image, uuid);
+    logout("uuid image  %s\n", uuid);
+    uuid_unparse(header->uuid_last_snap, uuid);
+    logout("uuid snap   %s\n", uuid);
+    uuid_unparse(header->uuid_link, uuid);
+    logout("uuid link   %s\n", uuid);
+    uuid_unparse(header->uuid_parent, uuid);
+    logout("uuid parent %s\n", uuid);
+}
+#endif
+
+static int vdi_check(BlockDriverState *bs)
+{
+    /* TODO: additional checks possible. */
+    BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
+    int n_errors = 0;
+    uint32_t blocks_allocated = 0;
+    uint32_t block;
+    uint32_t *bmap;
+    logout("\n");
+
+    bmap = qemu_malloc(s->header.blocks_in_image * sizeof(uint32_t));
+    memset(bmap, 0xff, s->header.blocks_in_image * sizeof(uint32_t));
+
+    /* Check block map and value of blocks_allocated. */
+    for (block = 0; block < s->header.blocks_in_image; block++) {
+        uint32_t bmap_entry = le32_to_cpu(s->bmap[block]);
+        if (bmap_entry != VDI_UNALLOCATED) {
+            if (bmap_entry < s->header.blocks_in_image) {
+                blocks_allocated++;
+                if (bmap[bmap_entry] == VDI_UNALLOCATED) {
+                    bmap[bmap_entry] = bmap_entry;
+                } else {
+                    fprintf(stderr, "ERROR: block index %" PRIu32
+                            " also used by %" PRIu32 "\n", bmap[bmap_entry], bmap_entry);
+                }
+            } else {
+                fprintf(stderr, "ERROR: block index %" PRIu32
+                        " too large, is %" PRIu32 "\n", block, bmap_entry);
+                n_errors++;
+            }
+        }
+    }
+    if (blocks_allocated != s->header.blocks_allocated) {
+        fprintf(stderr, "ERROR: allocated blocks mismatch, is %" PRIu32
+               ", should be %" PRIu32 "\n",
+               blocks_allocated, s->header.blocks_allocated);
+        n_errors++;
+    }
+
+    qemu_free(bmap);
+
+    return n_errors;
+}
+
+static int vdi_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
+{
+    /* TODO: vdi_get_info would be needed for machine snapshots.
+       vm_state_offset is still missing. */
+    BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
+    logout("\n");
+    bdi->cluster_size = s->block_size;
+    bdi->vm_state_offset = 0;
+    return 0;
+}
+
+static int vdi_make_empty(BlockDriverState *bs)
+{
+    /* TODO: missing code. */
+    logout("\n");
+    /* The return value for missing code must be 0, see block.c. */
+    return 0;
+}
+
+static int vdi_probe(const uint8_t *buf, int buf_size, const char *filename)
+{
+    const VdiHeader *header = (const VdiHeader *)buf;
+    int result = 0;
+
+    logout("\n");
+
+    if (buf_size < sizeof(*header)) {
+        /* Header too small, no VDI. */
+    } else if (le32_to_cpu(header->signature) == VDI_SIGNATURE) {
+        result = 100;
+    }
+
+    if (result == 0) {
+        logout("no vdi image\n");
+    } else {
+        logout("%s", header->text);
+    }
+
+    return result;
+}
+
+static int vdi_open(BlockDriverState *bs, const char *filename, int flags)
+{
+    BDRVVdiState *s = bs->opaque;
+    VdiHeader header;
+    size_t bmap_size;
+    int ret;
+
+    logout("\n");
+
+    ret = bdrv_file_open(&s->hd, filename, flags);
+    if (ret < 0) {
+        return ret;
+    }
+
+    if (bdrv_read(s->hd, 0, (uint8_t *)&header, 1) < 0) {
+        goto fail;
+    }
+
+    vdi_header_to_cpu(&header);
+#if defined(CONFIG_VDI_DEBUG)
+    vdi_header_print(&header);
+#endif
+
+    if (header.version != VDI_VERSION_1_1) {
+        logout("unsupported version %u.%u\n",
+               header.version >> 16, header.version & 0xffff);
+        goto fail;
+    } else if (header.offset_bmap % SECTOR_SIZE != 0) {
+        /* We only support block maps which start on a sector boundary. */
+        logout("unsupported block map offset 0x%x B\n", header.offset_bmap);
+        goto fail;
+    } else if (header.offset_data % SECTOR_SIZE != 0) {
+        /* We only support data blocks which start on a sector boundary. */
+        logout("unsupported data offset 0x%x B\n", header.offset_data);
+        goto fail;
+    } else if (header.sector_size != SECTOR_SIZE) {
+        logout("unsupported sector size %u B\n", header.sector_size);
+        goto fail;
+    } else if (header.block_size != 1 * MiB) {
+        logout("unsupported block size %u B\n", header.block_size);
+        goto fail;
+    } else if (header.disk_size !=
+               (uint64_t)header.blocks_in_image * header.block_size) {
+        logout("unexpected block number %u B\n", header.blocks_in_image);
+        goto fail;
+    } else if (!uuid_is_null(header.uuid_link)) {
+        logout("link uuid != 0, unsupported\n");
+        goto fail;
+    } else if (!uuid_is_null(header.uuid_parent)) {
+        logout("parent uuid != 0, unsupported\n");
+        goto fail;
+    }
+
+    bs->total_sectors = header.disk_size / SECTOR_SIZE;
+
+    s->block_size = header.block_size;
+    s->block_sectors = header.block_size / SECTOR_SIZE;
+    s->bmap_sector = header.offset_bmap / SECTOR_SIZE;
+    s->header = header;
+
+    bmap_size = header.blocks_in_image * sizeof(uint32_t);
+    s->bmap = qemu_malloc(bmap_size);
+    if (bdrv_read(s->hd, s->bmap_sector,
+                  (uint8_t *)s->bmap, bmap_size / SECTOR_SIZE) < 0) {
+        goto fail_free_bmap;
+    }
+
+    return 0;
+
+ fail_free_bmap:
+    qemu_free(s->bmap);
+
+ fail:
+    bdrv_delete(s->hd);
+    return -1;
+}
+
+static int vdi_is_allocated(BlockDriverState *bs, int64_t sector_num,
+                             int nb_sectors, int *pnum)
+{
+    /* TODO: Check for too large sector_num (in bdrv_is_allocated or here). */
+    BDRVVdiState *s = (BDRVVdiState *)bs->opaque;
+    size_t bmap_index = sector_num / s->block_sectors;
+    size_t sector_in_block = sector_num % s->block_sectors;
+    int n_sectors = s->block_sectors - sector_in_block;
+    uint32_t bmap_entry = le32_to_cpu(s->bmap[bmap_index]);
+    logout("%p, %" PRId64 ", %d, %p\n", bs, sector_num, nb_sectors, pnum);
+    if (n_sectors > nb_sectors) {
+        n_sectors = nb_sectors;
+    }
+    *pnum = n_sectors;
+    return bmap_entry != VDI_UNALLOCATED;
+}
+
+static void vdi_aio_cancel(BlockDriverAIOCB *blockacb)
+{
+    /* TODO: This code is untested. How can I get it executed? */
+    VdiAIOCB *acb = (VdiAIOCB *)blockacb;
+    logout("\n");
+    if (acb->hd_aiocb) {
+        bdrv_aio_cancel(acb->hd_aiocb);
+    }
+    qemu_aio_release(acb);
+}
+
+static AIOPool vdi_aio_pool = {
+    .aiocb_size = sizeof(VdiAIOCB),
+    .cancel = vdi_aio_cancel,
+};
+
+static VdiAIOCB *vdi_aio_setup(BlockDriverState *bs, int64_t sector_num,
+        QEMUIOVector *qiov, int nb_sectors,
+        BlockDriverCompletionFunc *cb, void *opaque, int is_write)
+{
+    VdiAIOCB *acb;
+
+    logout("%p, %" PRId64 ", %p, %d, %p, %p, %d\n",
+           bs, sector_num, qiov, nb_sectors, cb, opaque, is_write);
+
+    acb = qemu_aio_get(&vdi_aio_pool, bs, cb, opaque);
+    if (acb) {
+        acb->hd_aiocb = NULL;
+        acb->sector_num = sector_num;
+        acb->qiov = qiov;
+        if (qiov->niov > 1) {
+            acb->buf = qemu_blockalign(bs, qiov->size);
+            acb->orig_buf = acb->buf;
+            if (is_write) {
+                qemu_iovec_to_buffer(qiov, acb->buf);
+            }
+        } else {
+            acb->buf = (uint8_t *)qiov->iov->iov_base;
+        }
+        acb->nb_sectors = nb_sectors;
+        acb->n_sectors = 0;
+        acb->bmap_first = VDI_UNALLOCATED;
+        acb->bmap_last = VDI_UNALLOCATED;
+        acb->block_buffer = NULL;
+        acb->header_modified = 0;
+    }
+    return acb;
+}
+
+static int vdi_schedule_bh(QEMUBHFunc *cb, VdiAIOCB *acb)
+{
+    logout("\n");
+
+    if (acb->bh) {
+        return -EIO;
+    }
+
+    acb->bh = qemu_bh_new(cb, acb);
+    if (!acb->bh) {
+        return -EIO;
+    }
+
+    qemu_bh_schedule(acb->bh);
+
+    return 0;
+}
+
+static void vdi_aio_read_cb(void *opaque, int ret);
+
+static void vdi_aio_read_bh(void *opaque)
+{
+    VdiAIOCB *acb = opaque;
+    logout("\n");
+    qemu_bh_delete(acb->bh);
+    acb->bh = NULL;
+    vdi_aio_read_cb(opaque, 0);
+}
+
+static void vdi_aio_read_cb(void *opaque, int ret)
+{
+    VdiAIOCB *acb = opaque;
+    BlockDriverState *bs = acb->common.bs;
+    BDRVVdiState *s = bs->opaque;
+    uint32_t bmap_entry;
+    uint32_t block_index;
+    uint32_t sector_in_block;
+    uint32_t n_sectors;
+
+    logout("%u sectors read\n", acb->n_sectors);
+
+    acb->hd_aiocb = NULL;
+
+    if (ret < 0) {
+        goto done;
+    }
+
+    acb->nb_sectors -= acb->n_sectors;
+
+    if (acb->nb_sectors == 0) {
+        /* request completed */
+        ret = 0;
+        goto done;
+    }
+
+    acb->sector_num += acb->n_sectors;
+    acb->buf += acb->n_sectors * SECTOR_SIZE;
+
+    block_index = acb->sector_num / s->block_sectors;
+    sector_in_block = acb->sector_num % s->block_sectors;
+    n_sectors = s->block_sectors - sector_in_block;
+    if (n_sectors > acb->nb_sectors) {
+        n_sectors = acb->nb_sectors;
+    }
+
+    logout("will read %u sectors starting at sector %" PRIu64 "\n",
+           n_sectors, acb->sector_num);
+
+    /* prepare next AIO request */
+    acb->n_sectors = n_sectors;
+    bmap_entry = le32_to_cpu(s->bmap[block_index]);
+    if (bmap_entry == VDI_UNALLOCATED) {
+        /* Block not allocated, return zeros, no need to wait. */
+        memset(acb->buf, 0, n_sectors * SECTOR_SIZE);
+        ret = vdi_schedule_bh(vdi_aio_read_bh, acb);
+        if (ret < 0) {
+            goto done;
+        }
+    } else {
+        uint64_t offset = s->header.offset_data / SECTOR_SIZE +
+                          (uint64_t)bmap_entry * s->block_sectors +
+                          sector_in_block;
+        acb->hd_iov.iov_base = (void *)acb->buf;
+        acb->hd_iov.iov_len = n_sectors * SECTOR_SIZE;
+        qemu_iovec_init_external(&acb->hd_qiov, &acb->hd_iov, 1);
+        acb->hd_aiocb = bdrv_aio_readv(s->hd, offset, &acb->hd_qiov,
+                                       n_sectors, vdi_aio_read_cb, acb);
+        if (acb->hd_aiocb == NULL) {
+            goto done;
+        }
+    }
+    return;
+done:
+    if (acb->qiov->niov > 1) {
+        qemu_iovec_from_buffer(acb->qiov, acb->orig_buf, acb->qiov->size);
+        qemu_vfree(acb->orig_buf);
+    }
+    acb->common.cb(acb->common.opaque, ret);
+    qemu_aio_release(acb);
+}
+
+static BlockDriverAIOCB *vdi_aio_readv(BlockDriverState *bs,
+        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
+        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    VdiAIOCB *acb;
+    logout("\n");
+    acb = vdi_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, 0);
+    if (!acb) {
+        return NULL;
+    }
+    vdi_aio_read_cb(acb, 0);
+    return &acb->common;
+}
+
+static void vdi_aio_write_cb(void *opaque, int ret)
+{
+    VdiAIOCB *acb = opaque;
+    BlockDriverState *bs = acb->common.bs;
+    BDRVVdiState *s = bs->opaque;
+    uint32_t bmap_entry;
+    uint32_t block_index;
+    uint32_t sector_in_block;
+    uint32_t n_sectors;
+
+    acb->hd_aiocb = NULL;
+
+    if (ret < 0) {
+        goto done;
+    }
+
+    acb->nb_sectors -= acb->n_sectors;
+    acb->sector_num += acb->n_sectors;
+    acb->buf += acb->n_sectors * SECTOR_SIZE;
+
+    if (acb->nb_sectors == 0) {
+        logout("finished data write\n");
+        acb->n_sectors = 0;
+        if (acb->header_modified) {
+            VdiHeader *header = acb->block_buffer;
+            logout("now writing modified header\n");
+            assert(acb->bmap_first != VDI_UNALLOCATED);
+            *header = s->header;
+            vdi_header_to_le(header);
+            acb->header_modified = 0;
+            acb->hd_iov.iov_base = acb->block_buffer;
+            acb->hd_iov.iov_len = SECTOR_SIZE;
+            qemu_iovec_init_external(&acb->hd_qiov, &acb->hd_iov, 1);
+            acb->hd_aiocb = bdrv_aio_writev(s->hd, 0, &acb->hd_qiov, 1,
+                                            vdi_aio_write_cb, acb);
+            if (acb->hd_aiocb == NULL) {
+                goto done;
+            }
+            return;
+        } else if (acb->bmap_first != VDI_UNALLOCATED) {
+            /* One or more new blocks were allocated. */
+            uint64_t offset;
+            uint32_t bmap_first;
+            uint32_t bmap_last;
+            qemu_free(acb->block_buffer);
+            acb->block_buffer = NULL;
+            bmap_first = acb->bmap_first;
+            bmap_last = acb->bmap_last;
+            logout("now writing modified block map entry %u...%u\n",
+                   bmap_first, bmap_last);
+            /* Write modified sectors from block map. */
+            bmap_first /= (SECTOR_SIZE / sizeof(uint32_t));
+            bmap_last /= (SECTOR_SIZE / sizeof(uint32_t));
+            n_sectors = bmap_last - bmap_first + 1;
+            offset = s->bmap_sector + bmap_first;
+            acb->bmap_first = VDI_UNALLOCATED;
+            acb->hd_iov.iov_base = (uint8_t *)&s->bmap[0] +
+                                   bmap_first * SECTOR_SIZE;
+            acb->hd_iov.iov_len = n_sectors * SECTOR_SIZE;
+            qemu_iovec_init_external(&acb->hd_qiov, &acb->hd_iov, 1);
+            logout("will write %u block map sectors starting from entry %u\n",
+                   n_sectors, bmap_first);
+            acb->hd_aiocb = bdrv_aio_writev(s->hd, offset, &acb->hd_qiov,
+                                            n_sectors, vdi_aio_write_cb, acb);
+            if (acb->hd_aiocb == NULL) {
+                goto done;
+            }
+            return;
+        }
+        ret = 0;
+        goto done;
+    }
+
+    logout("%u sectors written\n", acb->n_sectors);
+
+    block_index = acb->sector_num / s->block_sectors;
+    sector_in_block = acb->sector_num % s->block_sectors;
+    n_sectors = s->block_sectors - sector_in_block;
+    if (n_sectors > acb->nb_sectors) {
+        n_sectors = acb->nb_sectors;
+    }
+
+    logout("will write %u sectors starting at sector %" PRIu64 "\n",
+           n_sectors, acb->sector_num);
+
+    /* prepare next AIO request */
+    acb->n_sectors = n_sectors;
+    bmap_entry = le32_to_cpu(s->bmap[block_index]);
+    if (bmap_entry == VDI_UNALLOCATED) {
+        /* Allocate new block and write to it. */
+        uint64_t offset;
+        uint8_t *block;
+        bmap_entry = s->header.blocks_allocated;
+        s->bmap[block_index] = cpu_to_le32(bmap_entry);
+        s->header.blocks_allocated++;
+        offset = s->header.offset_data / SECTOR_SIZE +
+                 (uint64_t)bmap_entry * s->block_sectors;
+        block = acb->block_buffer;
+        if (block == NULL) {
+            block = qemu_mallocz(s->block_size);
+            acb->block_buffer = block;
+            acb->bmap_first = block_index;
+            assert(!acb->header_modified);
+            acb->header_modified = 1;
+        }
+        acb->bmap_last = block_index;
+        memcpy(block + sector_in_block * SECTOR_SIZE,
+               acb->buf, n_sectors * SECTOR_SIZE);
+        acb->hd_iov.iov_base = block;
+        acb->hd_iov.iov_len = s->block_size;
+        qemu_iovec_init_external(&acb->hd_qiov, &acb->hd_iov, 1);
+        acb->hd_aiocb = bdrv_aio_writev(s->hd, offset,
+                                        &acb->hd_qiov, s->block_sectors,
+                                        vdi_aio_write_cb, acb);
+        if (acb->hd_aiocb == NULL) {
+            goto done;
+        }
+    } else {
+        uint64_t offset = s->header.offset_data / SECTOR_SIZE +
+                          (uint64_t)bmap_entry * s->block_sectors +
+                          sector_in_block;
+        acb->hd_iov.iov_base = acb->buf;
+        acb->hd_iov.iov_len = n_sectors * SECTOR_SIZE;
+        qemu_iovec_init_external(&acb->hd_qiov, &acb->hd_iov, 1);
+        acb->hd_aiocb = bdrv_aio_writev(s->hd, offset, &acb->hd_qiov,
+                                        n_sectors, vdi_aio_write_cb, acb);
+        if (acb->hd_aiocb == NULL) {
+            goto done;
+        }
+    }
+
+    return;
+
+done:
+    if (acb->qiov->niov > 1) {
+        qemu_vfree(acb->orig_buf);
+    }
+    acb->common.cb(acb->common.opaque, ret);
+    qemu_aio_release(acb);
+}
+
+static BlockDriverAIOCB *vdi_aio_writev(BlockDriverState *bs,
+        int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
+        BlockDriverCompletionFunc *cb, void *opaque)
+{
+    VdiAIOCB *acb;
+    logout("\n");
+    acb = vdi_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, 1);
+    if (!acb) {
+        return NULL;
+    }
+    vdi_aio_write_cb(acb, 0);
+    return &acb->common;
+}
+
+static int vdi_create(const char *filename, QEMUOptionParameter *options)
+{
+    int fd;
+    int result = 0;
+    uint64_t bytes = 0;
+    uint32_t blocks;
+    size_t block_size = 1 * MiB;
+    uint32_t image_type = VDI_TYPE_DYNAMIC;
+    VdiHeader header;
+    size_t i;
+    size_t bmap_size;
+    uint32_t *bmap;
+
+    logout("\n");
+
+    /* Read out options. */
+    while (options && options->name) {
+        if (!strcmp(options->name, BLOCK_OPT_SIZE)) {
+            bytes = options->value.n;
+#if defined(CONFIG_VDI_BLOCK_SIZE)
+        } else if (!strcmp(options->name, BLOCK_OPT_CLUSTER_SIZE)) {
+            if (options->value.n) {
+                /* TODO: Additional checks (SECTOR_SIZE * 2^n, ...). */
+                block_size = options->value.n;
+            }
+#endif
+#if defined(CONFIG_VDI_STATIC_IMAGE)
+        } else if (!strcmp(options->name, BLOCK_OPT_STATIC)) {
+            image_type = VDI_TYPE_STATIC;
+#endif
+        }
+        options++;
+    }
+
+    fd = open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY | O_LARGEFILE,
+              0644);
+    if (fd < 0) {
+        return -errno;
+    }
+
+    blocks = bytes / block_size;
+    bmap_size = blocks * sizeof(uint32_t);
+    bmap_size = ((bmap_size + SECTOR_SIZE - 1) & ~(SECTOR_SIZE -1));
+
+    memset(&header, 0, sizeof(header));
+    strcpy(header.text, VDI_TEXT);
+    header.signature = VDI_SIGNATURE;
+    header.version = VDI_VERSION_1_1;
+    header.header_size = 0x180;
+    header.image_type = image_type;
+    header.offset_bmap = 0x200;
+    header.offset_data = 0x200 + bmap_size;
+    header.sector_size = SECTOR_SIZE;
+    header.disk_size = bytes;
+    header.block_size = block_size;
+    header.blocks_in_image = blocks;
+    uuid_generate(header.uuid_image);
+    uuid_generate(header.uuid_last_snap);
+    /* There is no need to set header.uuid_link or header.uuid_parent here. */
+#if defined(CONFIG_VDI_DEBUG)
+    vdi_header_print(&header);
+#endif
+    vdi_header_to_le(&header);
+    if (write(fd, &header, sizeof(header)) < 0) {
+        result = -errno;
+    }
+
+    bmap = (uint32_t *)qemu_mallocz(bmap_size);
+    for (i = 0; i < blocks; i++) {
+        if (image_type == VDI_TYPE_STATIC) {
+            bmap[i] = i;
+        } else {
+            bmap[i] = VDI_UNALLOCATED;
+        }
+    }
+    if (write(fd, bmap, bmap_size) < 0) {
+        result = -errno;
+    }
+    qemu_free(bmap);
+    if (image_type == VDI_TYPE_STATIC) {
+        if (ftruncate(fd, sizeof(header) + bmap_size + blocks * block_size)) {
+            result = -errno;
+        }
+    }
+
+    if (close(fd) < 0) {
+        result = -errno;
+    }
+
+    return result;
+}
+
+static void vdi_close(BlockDriverState *bs)
+{
+    BDRVVdiState *s = bs->opaque;
+    logout("\n");
+    bdrv_delete(s->hd);
+}
+
+static void vdi_flush(BlockDriverState *bs)
+{
+    BDRVVdiState *s = bs->opaque;
+    logout("\n");
+    bdrv_flush(s->hd);
+}
+
+
+static QEMUOptionParameter vdi_create_options[] = {
+    {
+        .name = BLOCK_OPT_SIZE,
+        .type = OPT_SIZE,
+        .help = "Virtual disk size"
+    },
+#if defined(CONFIG_VDI_BLOCK_SIZE)
+    {
+        .name = BLOCK_OPT_CLUSTER_SIZE,
+        .type = OPT_SIZE,
+        .help = "VDI cluster (block) size"
+    },
+#endif
+#if defined(CONFIG_VDI_STATIC_IMAGE)
+    {
+        .name = BLOCK_OPT_STATIC,
+        .type = OPT_FLAG,
+        .help = "VDI static (pre-allocated) image"
+    },
+#endif
+    /* TODO: An additional option to set UUID values might be useful. */
+    { NULL }
+};
+
+static BlockDriver bdrv_vdi = {
+    .format_name = "vdi",
+    .instance_size = sizeof(BDRVVdiState),
+    .bdrv_probe = vdi_probe,
+    .bdrv_open = vdi_open,
+    .bdrv_close = vdi_close,
+    .bdrv_create = vdi_create,
+    .bdrv_flush = vdi_flush,
+    .bdrv_is_allocated = vdi_is_allocated,
+    .bdrv_make_empty = vdi_make_empty,
+
+    .bdrv_aio_readv = vdi_aio_readv,
+#if defined(CONFIG_VDI_WRITE)
+    .bdrv_aio_writev = vdi_aio_writev,
+#endif
+
+    .bdrv_get_info = vdi_get_info,
+
+    .create_options = vdi_create_options,
+    .bdrv_check = vdi_check,
+};
+
+static void bdrv_vdi_init(void)
+{
+    logout("\n");
+    bdrv_register(&bdrv_vdi);
+}
+
+block_init(bdrv_vdi_init);
diff --git a/qemu-img.texi b/qemu-img.texi
index 49d4e59..69e24b5 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -48,6 +48,8 @@ Old QEMU image format. Left for compatibility.
 User Mode Linux Copy On Write image format. Used to be the only growable
 image format in QEMU. It is supported only for compatibility with
 previous versions. It does not work on win32.
+@item vdi
+VirtualBox 1.1 compatible image format.
 @item vmdk
 VMware 3 and 4 compatible image format.
 @item cloop
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] Add new block driver for the VDI format (use aio)
  2009-07-31 15:04           ` Christoph Hellwig
@ 2009-07-31 19:53             ` Stefan Weil
  0 siblings, 0 replies; 44+ messages in thread
From: Stefan Weil @ 2009-07-31 19:53 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Kevin Wolf, QEMU Developers

Christoph Hellwig schrieb:
> On Mon, Jul 27, 2009 at 10:00:34AM +0200, Kevin Wolf wrote:
>   
>> Ok, that makes sense. Probably we should remove the #define completely
>> then. I mean, why creating images that nobody - not even we ourselves -
>> can read?
>>     
>
> I agree.  As mentioned during the previous rounds all these ifdef parts
> of code that can only be compiled in/out by touching the source code are
> really bad.  Either they are good enough to be enabled unconditionally
> (or at least through configure if they require a library or similar) or
> they are broken / useless enough to not bother.  If virtualbox only
> supports 1k block size images and we do aswell there's no point in
> carrying around this dead code.
>
>   
>>>> I guess you should remove this block before the patch is included.
>>>>   
>>>>         
>>> This is also one of the details were hints of the block driver experts
>>> would be helpful as I did not understand this aio_remove / aio_cancel
>>> mechanism.
>>>       
>> I wouldn't consider myself an AIO expert and I don't want to tell you
>> something wrong, so maybe Christoph would be the right one here?
>>     
>
> #if 0 is a horrible way for hints.  Coments with XXX: or TODO: are much
> better documentation.  I'll take a look at the aio implementation, but
> I'm far from expert on the qemu aio code.
>
>   
>>> I think it would be possible to extend the specification
>>> to support compression or encryption.
>>>
>>> The official specification (as far as I know it) does not
>>> support compression (nor encryption).
>>>       
>> Then remove the entry. The function vdi_write_compressed doesn't exist
>> and doesn't even make sense with the current specification. The same
>> applies for vdi_set_key.
>>     
>
> Seconded, keeping function stubs around just bloats and obsfucates the
> code without reason.
>
>
>   

Hi Christoph

Thanks for the review. Most of your comments were considered in my
latest patch version which I just sent to the list.

I assume that there will be a need for block sizes larger than 1 MiB
in very large images, so I did not remove these parts of the code.

Regards

Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] RFC: Add new block driver for the VDI format
  2009-07-06 13:37 ` [Qemu-devel] [PATCH] RFC: " Anthony Liguori
  2009-07-06 21:10   ` Stefan Weil
@ 2009-08-02 14:27   ` Avi Kivity
  2009-08-03  2:25     ` Anthony Liguori
  1 sibling, 1 reply; 44+ messages in thread
From: Avi Kivity @ 2009-08-02 14:27 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Kevin Wolf, QEMU Developers, Christoph Hellwig

On 07/06/2009 04:37 PM, Anthony Liguori wrote:
>
> I'd really like to get rid of synchronous IO functions in the block 
> layer.  One way to do this is to insist that all new block drivers 
> only implement the AIO functions.
>
> I think we should make this decree but I'd like to know if other 
> people think this is unreasonable first.  One potential model of block 
> drivers would involve synchronous IO and threads.  I'm not a big fan 
> of that model and I don't think it's an easy conversion from today's 
> synchronous IO drivers to that model because the locking and 
> re-entrance needs careful consideration.
>

I agree that sync+threads is not easy, but well performing async is 
much, much harder.  Consider that qcow2 still has synchronous 
operations, and that eliminating the RMW when writing a partial cluster 
concurrently (a very common operation with 64K clusters) is very hard to 
do ayncly and much easier syncly.

Given in addition the large numbers of format drivers, I think we should 
prefer sync+threads over trying to convert all format drivers to full async.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] RFC: Add new block driver for the VDI format
  2009-08-02 14:27   ` Avi Kivity
@ 2009-08-03  2:25     ` Anthony Liguori
  2009-08-03 13:02       ` Avi Kivity
  0 siblings, 1 reply; 44+ messages in thread
From: Anthony Liguori @ 2009-08-03  2:25 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Kevin Wolf, QEMU Developers, Christoph Hellwig

Avi Kivity wrote:
> On 07/06/2009 04:37 PM, Anthony Liguori wrote:
>>
>> I'd really like to get rid of synchronous IO functions in the block 
>> layer.  One way to do this is to insist that all new block drivers 
>> only implement the AIO functions.
>>
>> I think we should make this decree but I'd like to know if other 
>> people think this is unreasonable first.  One potential model of 
>> block drivers would involve synchronous IO and threads.  I'm not a 
>> big fan of that model and I don't think it's an easy conversion from 
>> today's synchronous IO drivers to that model because the locking and 
>> re-entrance needs careful consideration.
>>
>
> I agree that sync+threads is not easy, but well performing async is 
> much, much harder.  Consider that qcow2 still has synchronous 
> operations, and that eliminating the RMW when writing a partial 
> cluster concurrently (a very common operation with 64K clusters) is 
> very hard to do ayncly and much easier syncly.

Supporting parallel RMW operations is certainly difficult, but you're 
confusing parallel RMW ops with asynchronous RMW ops.  You just have to 
queue requests and handle them in order.  It's only mildly more 
difficult to deal with asynchronous I/O and it avoids all the nastiness 
associated with threads and locking.

Fundamentally, threads don't help the RMW problem because you probably 
would just hold a look for the entire RMW operation so you're 
effectively queuing any RMW op.

> Given in addition the large numbers of format drivers, I think we 
> should prefer sync+threads over trying to convert all format drivers 
> to full async.

It's just shifting the problem from one place to another.  Instead of 
figuring out the state machine, you have to figure out how to do the 
locking.  The danger of the later is that it gives you the illusion that 
it's an easy problem and is therefore prone to error.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] RFC: Add new block driver for the VDI format
  2009-08-03  2:25     ` Anthony Liguori
@ 2009-08-03 13:02       ` Avi Kivity
  2009-08-03 15:20         ` Christoph Hellwig
  0 siblings, 1 reply; 44+ messages in thread
From: Avi Kivity @ 2009-08-03 13:02 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Kevin Wolf, QEMU Developers, Christoph Hellwig

On 08/03/2009 05:25 AM, Anthony Liguori wrote:
> Avi Kivity wrote:
>> On 07/06/2009 04:37 PM, Anthony Liguori wrote:
>>>
>>> I'd really like to get rid of synchronous IO functions in the block 
>>> layer.  One way to do this is to insist that all new block drivers 
>>> only implement the AIO functions.
>>>
>>> I think we should make this decree but I'd like to know if other 
>>> people think this is unreasonable first.  One potential model of 
>>> block drivers would involve synchronous IO and threads.  I'm not a 
>>> big fan of that model and I don't think it's an easy conversion from 
>>> today's synchronous IO drivers to that model because the locking and 
>>> re-entrance needs careful consideration.
>>>
>>
>> I agree that sync+threads is not easy, but well performing async is 
>> much, much harder.  Consider that qcow2 still has synchronous 
>> operations, and that eliminating the RMW when writing a partial 
>> cluster concurrently (a very common operation with 64K clusters) is 
>> very hard to do ayncly and much easier syncly.
>
> Supporting parallel RMW operations is certainly difficult, but you're 
> confusing parallel RMW ops with asynchronous RMW ops.  You just have 
> to queue requests and handle them in order.  It's only mildly more 
> difficult to deal with asynchronous I/O and it avoids all the 
> nastiness associated with threads and locking.

I'm talking about a guest sequential write emitted as multiple adjacent 
requests in parallel.  Currently we'll write the first request and the 
second request in different locations, then do a rmw to merge the two 
blocks (I think...).

> Fundamentally, threads don't help the RMW problem because you probably 
> would just hold a look for the entire RMW operation so you're 
> effectively queuing any RMW op.

You do get some queuing but layout is improved, and it's not a W-RMW; 
instead it's a W-WW.

Theoretically anything you can do with threads you can do with async 
operations but experience has proven that async is much more difficult.  
Consider the last qcow2 bug.

>
>> Given in addition the large numbers of format drivers, I think we 
>> should prefer sync+threads over trying to convert all format drivers 
>> to full async.
>
> It's just shifting the problem from one place to another.  Instead of 
> figuring out the state machine, you have to figure out how to do the 
> locking.  The danger of the later is that it gives you the illusion 
> that it's an easy problem and is therefore prone to error.

Locking _is_ an easier problem than figuring out the state machine.  I 
can't prove this but there's numerous anecdotal evidence on the subject.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] RFC: Add new block driver for the VDI format
  2009-08-03 13:02       ` Avi Kivity
@ 2009-08-03 15:20         ` Christoph Hellwig
  0 siblings, 0 replies; 44+ messages in thread
From: Christoph Hellwig @ 2009-08-03 15:20 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Kevin Wolf, QEMU Developers, Christoph Hellwig

On Mon, Aug 03, 2009 at 04:02:58PM +0300, Avi Kivity wrote:
> Theoretically anything you can do with threads you can do with async 
> operations but experience has proven that async is much more difficult.  
> Consider the last qcow2 bug.

> Locking _is_ an easier problem than figuring out the state machine.  I 
> can't prove this but there's numerous anecdotal evidence on the subject.

Having worked with state machines and threads I agree.  Unless you have
very good runtime support (which I think is almost impossible in C) it's
extremly hard and error prone to do state machines that track every
possible blocking point.  Threads and locking are much easier, but they
come with a cost.  With a threaded model like the one in the qemu taw posix
code currently we will context switch for every request, even if it
would not block.

> 
> -- 
> error compiling committee.c: too many arguments to function
---end quoted text---

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] add support for new option of vdi format
  2009-07-23 20:30   ` [Qemu-devel] [PATCH] add support for new option of vdi format Stefan Weil
  2009-07-23 20:34     ` [Qemu-devel] " Stefan Weil
  2009-07-31 14:59     ` [Qemu-devel] " Christoph Hellwig
@ 2009-08-13 16:53     ` Christoph Hellwig
  2 siblings, 0 replies; 44+ messages in thread
From: Christoph Hellwig @ 2009-08-13 16:53 UTC (permalink / raw)
  To: Stefan Weil; +Cc: QEMU Developers

On Thu, Jul 23, 2009 at 10:30:45PM +0200, Stefan Weil wrote:
> VDI supports an image option 'static'.
> Ignore "static=off" from qemu-img output.

Thanks, I've put this in now.

With current qmu test cases 005, 009, 010 and 011 fail for me with vdi,
any idea why?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Qemu-devel] [PATCH] Check availability of uuid header / library
  2009-07-23 20:27   ` [Qemu-devel] [PATCH] Check availability of uuid header / lib Stefan Weil
  2009-07-24  6:32     ` Christoph Egger
@ 2009-10-01 18:10     ` Stefan Weil
  1 sibling, 0 replies; 44+ messages in thread
From: Stefan Weil @ 2009-10-01 18:10 UTC (permalink / raw)
  To: QEMU Developers

If available, the Universally Unique Identifier library
is used by the vdi block driver.

Other parts of QEMU (vl.c) could also use it.

This is an updated version of my previous patch
with changes needed by the current QEMU configuration
standard.

Signed-off-by: Stefan Weil <weil@mail.berlios.de>
---
 block/vdi.c |    4 ++--
 configure   |   36 ++++++++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/block/vdi.c b/block/vdi.c
index f5e38db..45aa81c 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -53,7 +53,7 @@
 #include "block_int.h"
 #include "module.h"
 
-#if defined(HAVE_UUID_H)
+#if defined(CONFIG_UUID)
 #include <uuid/uuid.h>
 #else
 /* TODO: move uuid emulation to some central place in QEMU. */
@@ -116,7 +116,7 @@ void uuid_unparse(const uuid_t uu, char *out);
 /* Unallocated blocks use this index (no need to convert endianess). */
 #define VDI_UNALLOCATED UINT32_MAX
 
-#if !defined(HAVE_UUID_H)
+#if !defined(CONFIG_UUID)
 void uuid_generate(uuid_t out)
 {
     memset(out, 0, sizeof(out));
diff --git a/configure b/configure
index fb5b6bb..33521d1 100755
--- a/configure
+++ b/configure
@@ -187,6 +187,7 @@ kvm=""
 nptl=""
 sdl=""
 sparse="no"
+uuid=""
 vde=""
 vnc_tls=""
 vnc_sasl=""
@@ -451,6 +452,10 @@ for opt do
   ;;
   --disable-slirp) slirp="no"
   ;;
+  --disable-uuid) uuid="no"
+  ;;
+  --enable-uuid) uuid="yes"
+  ;;
   --disable-vde) vde="no"
   ;;
   --enable-vde) vde="yes"
@@ -695,6 +700,8 @@ echo "  --fmod-inc               path to FMOD includes"
 echo "  --oss-lib                path to OSS library"
 echo "  --enable-uname-release=R Return R for uname -r in usermode emulation"
 echo "  --sparc_cpu=V            Build qemu for Sparc architecture v7, v8, v8plus, v8plusa, v9"
+echo "  --disable-uuid           disable uuid support"
+echo "  --enable-uuid            enable uuid support"
 echo "  --disable-vde            disable support for vde network"
 echo "  --enable-vde             enable support for vde network"
 echo "  --disable-linux-aio      disable Linux AIO support"
@@ -1047,6 +1054,31 @@ if compile_prog "" "" ; then
 fi
 
 ##########################################
+# uuid_generate() probe, used for vdi block driver
+if test "$uuid" != "no" ; then
+  uuid_libs="-luuid"
+  cat > $TMPC << EOF
+#include <uuid/uuid.h>
+int main(void)
+{
+    uuid_t my_uuid;
+    uuid_generate(my_uuid);
+    return 0;
+}
+EOF
+  if compile_prog "" "$uuid_libs" ; then
+    uuid="yes"
+    libs_softmmu="$uuid_libs $libs_softmmu"
+    libs_tools="$uuid_libs $libs_tools"
+  else
+    if test "$uuid" = "yes" ; then
+      feature_not_found "uuid"
+    fi
+    uuid=no
+  fi
+fi
+
+##########################################
 # vde libraries probe
 if test "$vde" != "no" ; then
   vde_libs="-lvdeplug"
@@ -1751,6 +1783,7 @@ echo "KVM support       $kvm"
 echo "fdt support       $fdt"
 echo "preadv support    $preadv"
 echo "fdatasync         $fdatasync"
+echo "uuid support      $uuid"
 
 if test $sdl_too_old = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -1861,6 +1894,9 @@ fi
 if test "$fnmatch" = "yes" ; then
   echo "CONFIG_FNMATCH=y" >> $config_host_mak
 fi
+if test "$uuid" = "yes" ; then
+  echo "CONFIG_UUID=y" >> $config_host_mak
+fi
 qemu_version=`head $source_path/VERSION`
 echo "VERSION=$qemu_version" >>$config_host_mak
 echo "PKGVERSION=$pkgversion" >>$config_host_mak
-- 
1.5.6.5

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] Check availability of uuid header / lib
  2009-07-24  6:32     ` Christoph Egger
@ 2009-10-01 18:13       ` Stefan Weil
  2009-10-02  8:32         ` Christoph Egger
  0 siblings, 1 reply; 44+ messages in thread
From: Stefan Weil @ 2009-10-01 18:13 UTC (permalink / raw)
  To: Christoph Egger; +Cc: qemu-devel

Christoph Egger schrieb:
> On Thursday 23 July 2009 22:27:54 Stefan Weil wrote:
>   
>> The Universally Unique Identifier library will be used
>> for the new vdi block driver and maybe other parts of QEMU.
>>     
>
> This is very Linux specific.
> On NetBSD, the header is in <sys/uuid.h> and part of libc.
> The API implements DCE 1.1 RPC specification which is
> very different from Linux uuid.
>
> Christoph
>
>   


The Linux implementation claims to conform to OSF DCE 1.1,
see manpages of libuuid.

Do you think that we need a QEMU wrapper to handle different
implementations?

Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] [PATCH] Check availability of uuid header / lib
  2009-10-01 18:13       ` Stefan Weil
@ 2009-10-02  8:32         ` Christoph Egger
  0 siblings, 0 replies; 44+ messages in thread
From: Christoph Egger @ 2009-10-02  8:32 UTC (permalink / raw)
  To: Stefan Weil; +Cc: qemu-devel

On Thursday 01 October 2009 20:13:25 Stefan Weil wrote:
> Christoph Egger schrieb:
> > On Thursday 23 July 2009 22:27:54 Stefan Weil wrote:
> >> The Universally Unique Identifier library will be used
> >> for the new vdi block driver and maybe other parts of QEMU.
> >
> > This is very Linux specific.
> > On NetBSD, the header is in <sys/uuid.h> and part of libc.
> > The API implements DCE 1.1 RPC specification which is
> > very different from Linux uuid.
> >
> > Christoph
>
> The Linux implementation claims to conform to OSF DCE 1.1,
> see manpages of libuuid.
>
> Do you think that we need a QEMU wrapper to handle different
> implementations?

Yes. This has already been abstracted in Xen's blktap2 implementation.
You can adapt it to QEMU.

Christoph


-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Karl-Hammerschmidt-Str. 34, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Andrew Bowd, Thomas M. McCoy, Giuliano Meroni
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2009-10-02  8:32 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-07-03 19:24 [Qemu-devel] [PATCH] RFC: Add new block driver for the VDI format Stefan Weil
2009-07-03 19:29 ` [Qemu-devel] [PATCH] Check availability of uuid header / lib Stefan Weil
2009-07-03 19:29   ` [Qemu-devel] [PATCH] Add new block driver for the VDI format Stefan Weil
2009-07-05  8:05     ` Christoph Hellwig
2009-07-05 14:02       ` Stefan Weil
2009-07-06 10:25         ` Christoph Hellwig
2009-07-06 17:19           ` Stefan Weil
2009-07-05 14:44     ` Kevin Wolf
2009-07-06 13:37 ` [Qemu-devel] [PATCH] RFC: " Anthony Liguori
2009-07-06 21:10   ` Stefan Weil
2009-07-06 21:28     ` Anthony Liguori
2009-07-07  7:55     ` Kevin Wolf
2009-07-07  9:04       ` Jamie Lokier
2009-07-07 10:30       ` Christoph Hellwig
2009-07-07 10:33         ` Kevin Wolf
2009-08-02 14:27   ` Avi Kivity
2009-08-03  2:25     ` Anthony Liguori
2009-08-03 13:02       ` Avi Kivity
2009-08-03 15:20         ` Christoph Hellwig
2009-07-23 15:58 ` [Qemu-devel] [PATCH] RFC: Add new block driver for the VDI format (aio version) Stefan Weil
2009-07-23 20:27   ` [Qemu-devel] [PATCH] Check availability of uuid header / lib Stefan Weil
2009-07-24  6:32     ` Christoph Egger
2009-10-01 18:13       ` Stefan Weil
2009-10-02  8:32         ` Christoph Egger
2009-10-01 18:10     ` [Qemu-devel] [PATCH] Check availability of uuid header / library Stefan Weil
2009-07-23 20:29   ` [Qemu-devel] [PATCH] Add new block driver for the VDI format (use aio) Stefan Weil
2009-07-24  9:18     ` Kevin Wolf
2009-07-24 16:20       ` Stefan Weil
2009-07-27  8:00         ` Kevin Wolf
2009-07-27  9:23           ` Jamie Lokier
2009-07-28  6:37             ` Amit Shah
2009-07-28  8:34               ` Jamie Lokier
2009-07-28  8:56                 ` Daniel P. Berrange
2009-07-28  9:03                   ` Jamie Lokier
2009-07-28  9:11                     ` Kevin Wolf
2009-07-31 15:04           ` Christoph Hellwig
2009-07-31 19:53             ` Stefan Weil
2009-07-31 15:25     ` Anthony Liguori
2009-07-31 18:27       ` Stefan Weil
2009-07-31 19:45         ` [Qemu-devel] [PATCH] Add new block driver for the VDI format (only aio supported) Stefan Weil
2009-07-23 20:30   ` [Qemu-devel] [PATCH] add support for new option of vdi format Stefan Weil
2009-07-23 20:34     ` [Qemu-devel] " Stefan Weil
2009-07-31 14:59     ` [Qemu-devel] " Christoph Hellwig
2009-08-13 16:53     ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.