All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/9] VHDX log replay and write support
@ 2013-07-24 17:54 Jeff Cody
  2013-07-24 17:54 ` [Qemu-devel] [PATCH 1/9] block: vhdx - minor comments and typo correction Jeff Cody
                   ` (8 more replies)
  0 siblings, 9 replies; 19+ messages in thread
From: Jeff Cody @ 2013-07-24 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, stefanha

This patch series contains the initial VHDX log parsing, replay,
and write support.

This will allow an existing log in a VHDX image to be replayed (e.g., a VHDX
image from a Hyper-V host that crashed).  In addition, metadata writes are
enabled through the log.  This allows write support to be enabled for VHDX,
as the BAT can be updated safely via the log journal.

The patches are also available from github, for testing:
https://github.com/codyprime/qemu-kvm-jtc/tree/jtc-vhdx-latest

Jeff Cody (9):
  block: vhdx - minor comments and typo correction.
  block: vhdx - add header update capability.
  block: vhdx code movement - VHDXMetadataEntries and BDRVVHDXState to
    header.
  block: vhdx - log support struct and defines
  block: vhdx - break endian translation functions out
  block: vhdx - update log guid in header, and first write tracker
  block: vhdx - log parsing, replay, and flush support
  block: vhdx - add log write support
  block: vhdx write support

 block/Makefile.objs |    2 +-
 block/vhdx-endian.c |  141 ++++++++
 block/vhdx-log.c    | 1007 +++++++++++++++++++++++++++++++++++++++++++++++++++
 block/vhdx.c        |  394 ++++++++++++++------
 block/vhdx.h        |  110 +++++-
 configure           |   13 +
 6 files changed, 1556 insertions(+), 111 deletions(-)
 create mode 100644 block/vhdx-endian.c
 create mode 100644 block/vhdx-log.c

-- 
1.8.1.4

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 1/9] block: vhdx - minor comments and typo correction.
  2013-07-24 17:54 [Qemu-devel] [PATCH 0/9] VHDX log replay and write support Jeff Cody
@ 2013-07-24 17:54 ` Jeff Cody
  2013-07-24 17:54 ` [Qemu-devel] [PATCH 2/9] block: vhdx - add header update capability Jeff Cody
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Jeff Cody @ 2013-07-24 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, stefanha

Just a couple of minor comments to help note where allocated
buffers are freed, and a typo fix.

Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 block/vhdx.c | 6 ++++--
 block/vhdx.h | 6 +++---
 2 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/block/vhdx.c b/block/vhdx.c
index e9704b1..56bc88e 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -6,9 +6,9 @@
  * Authors:
  *  Jeff Cody <jcody@redhat.com>
  *
- *  This is based on the "VHDX Format Specification v0.95", published 4/12/2012
+ *  This is based on the "VHDX Format Specification v1.00", published 8/25/2012
  *  by Microsoft:
- *      https://www.microsoft.com/en-us/download/details.aspx?id=29681
+ *      https://www.microsoft.com/en-us/download/details.aspx?id=34750
  *
  * This work is licensed under the terms of the GNU LGPL, version 2 or later.
  * See the COPYING.LIB file in the top-level directory.
@@ -262,6 +262,7 @@ static int vhdx_parse_header(BlockDriverState *bs, BDRVVHDXState *s)
     uint64_t h2_seq = 0;
     uint8_t *buffer;
 
+    /* header1 & header2 are freed in vhdx_close() */
     header1 = qemu_blockalign(bs, sizeof(VHDXHeader));
     header2 = qemu_blockalign(bs, sizeof(VHDXHeader));
 
@@ -787,6 +788,7 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, int flags)
         goto fail;
     }
 
+    /* s->bat is freed in vhdx_close() */
     s->bat = qemu_blockalign(bs, s->bat_rt.length);
 
     ret = bdrv_pread(bs->file, s->bat_offset, s->bat, s->bat_rt.length);
diff --git a/block/vhdx.h b/block/vhdx.h
index c3b64c6..1dbb320 100644
--- a/block/vhdx.h
+++ b/block/vhdx.h
@@ -6,9 +6,9 @@
  * Authors:
  *  Jeff Cody <jcody@redhat.com>
  *
- *  This is based on the "VHDX Format Specification v0.95", published 4/12/2012
+ *  This is based on the "VHDX Format Specification v1.00", published 8/25/2012
  *  by Microsoft:
- *      https://www.microsoft.com/en-us/download/details.aspx?id=29681
+ *      https://www.microsoft.com/en-us/download/details.aspx?id=34750
  *
  * This work is licensed under the terms of the GNU LGPL, version 2 or later.
  * See the COPYING.LIB file in the top-level directory.
@@ -116,7 +116,7 @@ typedef struct QEMU_PACKED VHDXHeader {
                                            valid. */
     uint16_t    log_version;            /* version of the log format. Mustn't be
                                            zero, unless log_guid is also zero */
-    uint16_t    version;                /* version of th evhdx file.  Currently,
+    uint16_t    version;                /* version of the vhdx file.  Currently,
                                            only supported version is "1" */
     uint32_t    log_length;             /* length of the log.  Must be multiple
                                            of 1MB */
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 2/9] block: vhdx - add header update capability.
  2013-07-24 17:54 [Qemu-devel] [PATCH 0/9] VHDX log replay and write support Jeff Cody
  2013-07-24 17:54 ` [Qemu-devel] [PATCH 1/9] block: vhdx - minor comments and typo correction Jeff Cody
@ 2013-07-24 17:54 ` Jeff Cody
  2013-07-26  6:49   ` Fam Zheng
  2013-07-24 17:54 ` [Qemu-devel] [PATCH 3/9] block: vhdx code movement - VHDXMetadataEntries and BDRVVHDXState to header Jeff Cody
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Jeff Cody @ 2013-07-24 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, stefanha

This adds the ability to update the headers in a VHDX image, including
generating a new MS-compatible GUID.

As VHDX depends on uuid.h, VHDX is now a configurable build option.  If
VHDX support is enabled, that will also enable uuid as well.  The
default is to have VHDX enabled.

To enable/disable VHDX:  --enable-vhdx, --disable-vhdx

Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 block/Makefile.objs |   2 +-
 block/vhdx.c        | 157 +++++++++++++++++++++++++++++++++++++++++++++++++++-
 block/vhdx.h        |  12 +++-
 configure           |  13 +++++
 4 files changed, 180 insertions(+), 4 deletions(-)

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 4cf9aa4..e5e54e6 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -2,7 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
 block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
-block-obj-y += vhdx.o
+block-obj-$(CONFIG_VHDX) += vhdx.o
 block-obj-y += parallels.o blkdebug.o blkverify.o
 block-obj-y += snapshot.o qapi.o
 block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
diff --git a/block/vhdx.c b/block/vhdx.c
index 56bc88e..13e486d 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -21,6 +21,7 @@
 #include "qemu/crc32c.h"
 #include "block/vhdx.h"
 
+#include <uuid/uuid.h>
 
 /* Several metadata and region table data entries are identified by
  * guids in  a MS-specific GUID format. */
@@ -156,11 +157,40 @@ typedef struct BDRVVHDXState {
     VHDXBatEntry *bat;
     uint64_t bat_offset;
 
+    MSGUID session_guid;
+
+
     VHDXParentLocatorHeader parent_header;
     VHDXParentLocatorEntry *parent_entries;
 
 } BDRVVHDXState;
 
+/* Calculates new checksum.
+ *
+ * Zero is substituted during crc calculation for the original crc field
+ * crc_offset: byte offset in buf of the buffer crc
+ * buf: buffer pointer
+ * size: size of buffer (must be > crc_offset+4)
+ *
+ * Note: The resulting checksum is in the CPU endianness, not necessarily
+ *       in the file format endianness (LE).  Any header export to disk should
+ *       make sure that vhdx_header_le_export() is used to convert to the
+ *       correct endianness
+ */
+uint32_t vhdx_update_checksum(uint8_t *buf, size_t size, int crc_offset)
+{
+    uint32_t crc;
+
+    assert(buf != NULL);
+    assert(size > (crc_offset + 4));
+
+    memset(buf + crc_offset, 0, sizeof(crc));
+    crc =  crc32c(0xffffffff, buf, size);
+    memcpy(buf + crc_offset, &crc, sizeof(crc));
+
+    return crc;
+}
+
 uint32_t vhdx_checksum_calc(uint32_t crc, uint8_t *buf, size_t size,
                             int crc_offset)
 {
@@ -212,6 +242,24 @@ bool vhdx_checksum_is_valid(uint8_t *buf, size_t size, int crc_offset)
 
 
 /*
+ * This generates a UUID that is compliant with the MS GUIDs used
+ * in the VHDX spec (and elsewhere).
+ *
+ * We can do this with uuid_generate if uuid.h is present,
+ * however not all systems have uuid and the generation is
+ * pretty straightforward for the DCE + random usage case
+ *
+ */
+void vhdx_guid_generate(MSGUID *guid)
+{
+    uuid_t uuid;
+    assert(guid != NULL);
+
+    uuid_generate(uuid);
+    memcpy(guid, uuid, 16);
+}
+
+/*
  * Per the MS VHDX Specification, for every VHDX file:
  *      - The header section is fixed size - 1 MB
  *      - The header section is always the first "object"
@@ -249,6 +297,107 @@ static void vhdx_header_le_import(VHDXHeader *h)
     le64_to_cpus(&h->log_offset);
 }
 
+/* All VHDX structures on disk are little endian */
+static void vhdx_header_le_export(VHDXHeader *orig_h, VHDXHeader *new_h)
+{
+    assert(orig_h != NULL);
+    assert(new_h != NULL);
+
+    new_h->signature       = cpu_to_le32(orig_h->signature);
+    new_h->checksum        = cpu_to_le32(orig_h->checksum);
+    new_h->sequence_number = cpu_to_le64(orig_h->sequence_number);
+
+    memcpy(&new_h->file_write_guid, &orig_h->file_write_guid, sizeof(MSGUID));
+    memcpy(&new_h->data_write_guid, &orig_h->data_write_guid, sizeof(MSGUID));
+    memcpy(&new_h->log_guid,        &orig_h->log_guid,        sizeof(MSGUID));
+
+    cpu_to_leguids(&new_h->file_write_guid);
+    cpu_to_leguids(&new_h->data_write_guid);
+    cpu_to_leguids(&new_h->log_guid);
+
+    new_h->log_version     = cpu_to_le16(orig_h->log_version);
+    new_h->version         = cpu_to_le16(orig_h->version);
+    new_h->log_length      = cpu_to_le32(orig_h->log_length);
+    new_h->log_offset      = cpu_to_le64(orig_h->log_offset);
+}
+
+/* Update the VHDX headers
+ *
+ * This follows the VHDX spec procedures for header updates.
+ *
+ *  - non-current header is updated with largest sequence number
+ */
+static int vhdx_update_header(BlockDriverState *bs, BDRVVHDXState *s, bool rw)
+{
+    int ret = 0;
+    int hdr_idx = 0;
+    uint64_t header_offset = VHDX_HEADER1_OFFSET;
+
+    VHDXHeader *active_header;
+    VHDXHeader *inactive_header;
+    VHDXHeader header_le;
+    uint8_t *buffer;
+
+    /* operate on the non-current header */
+    if (s->curr_header == 0) {
+        hdr_idx = 1;
+        header_offset = VHDX_HEADER2_OFFSET;
+    }
+
+    active_header   = s->headers[s->curr_header];
+    inactive_header = s->headers[hdr_idx];
+
+    inactive_header->sequence_number = active_header->sequence_number + 1;
+
+    /* a new file guid must be generate before any file write, including
+     * headers */
+    memcpy(&inactive_header->file_write_guid, &s->session_guid,
+           sizeof(MSGUID));
+
+    /* a new data guid only needs to be generate before any guest-visible
+     * writes, so update it if the image is opened r/w. */
+    if (rw) {
+        vhdx_guid_generate(&inactive_header->data_write_guid);
+    }
+
+    /* the header checksum is not over just the packed size of VHDXHeader,
+     * but rather over the entire 'reserved' range for the header, which is
+     * 4KB (VHDX_HEADER_SIZE). */
+
+    buffer = qemu_blockalign(bs, VHDX_HEADER_SIZE);
+    /* we can't assume the extra reserved bytes are 0 */
+    ret = bdrv_pread(bs->file, header_offset, buffer, VHDX_HEADER_SIZE);
+    if (ret < 0) {
+        goto fail;
+    }
+    /* overwrite the actual VHDXHeader portion */
+    memcpy(buffer, inactive_header, sizeof(VHDXHeader));
+    inactive_header->checksum = vhdx_update_checksum(buffer,
+                                                     VHDX_HEADER_SIZE, 4);
+    vhdx_header_le_export(inactive_header, &header_le);
+    bdrv_pwrite_sync(bs->file, header_offset, &header_le, sizeof(VHDXHeader));
+    s->curr_header = hdr_idx;
+
+fail:
+    qemu_vfree(buffer);
+    return ret;
+}
+
+/*
+ * The VHDX spec calls for header updates to be performed twice, so that both
+ * the current and non-current header have valid info
+ */
+static int vhdx_update_headers(BlockDriverState *bs, BDRVVHDXState *s, bool rw)
+{
+    int ret;
+
+    ret = vhdx_update_header(bs, s, rw);
+    if (ret < 0) {
+        return ret;
+    }
+    ret = vhdx_update_header(bs, s, rw);
+    return ret;
+}
 
 /* opens the specified header block from the VHDX file header section */
 static int vhdx_parse_header(BlockDriverState *bs, BDRVVHDXState *s)
@@ -739,6 +888,11 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, int flags)
         goto fail;
     }
 
+    /* This is used for any header updates, for the file_write_guid.
+     * The spec dictates that a new value should be used for the first
+     * header update */
+    vhdx_guid_generate(&s->session_guid);
+
     ret = vhdx_parse_header(bs, s);
     if (ret) {
         goto fail;
@@ -801,8 +955,7 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, int flags)
     }
 
     if (flags & BDRV_O_RDWR) {
-        ret = -ENOTSUP;
-        goto fail;
+        vhdx_update_headers(bs, s, false);
     }
 
     /* TODO: differencing files, write */
diff --git a/block/vhdx.h b/block/vhdx.h
index 1dbb320..3999cb1 100644
--- a/block/vhdx.h
+++ b/block/vhdx.h
@@ -309,17 +309,27 @@ typedef struct QEMU_PACKED VHDXParentLocatorEntry {
 /* ----- END VHDX SPECIFICATION STRUCTURES ---- */
 
 
+void vhdx_guid_generate(MSGUID *guid);
+
+uint32_t vhdx_update_checksum(uint8_t *buf, size_t size, int crc_offset);
 uint32_t vhdx_checksum_calc(uint32_t crc, uint8_t *buf, size_t size,
                             int crc_offset);
 
 bool vhdx_checksum_is_valid(uint8_t *buf, size_t size, int crc_offset);
 
 
-static void leguid_to_cpus(MSGUID *guid)
+static inline void leguid_to_cpus(MSGUID *guid)
 {
     le32_to_cpus(&guid->data1);
     le16_to_cpus(&guid->data2);
     le16_to_cpus(&guid->data3);
 }
 
+static inline void cpu_to_leguids(MSGUID *guid)
+{
+    cpu_to_le32s(&guid->data1);
+    cpu_to_le16s(&guid->data2);
+    cpu_to_le16s(&guid->data3);
+}
+
 #endif
diff --git a/configure b/configure
index 877a821..821b790 100755
--- a/configure
+++ b/configure
@@ -244,6 +244,7 @@ gtk=""
 gtkabi="2.0"
 tpm="no"
 libssh2=""
+vhdx="yes"
 
 # parse CC options first
 for opt do
@@ -950,6 +951,11 @@ for opt do
   ;;
   --enable-libssh2) libssh2="yes"
   ;;
+  --enable-vhdx) vhdx="yes" ;
+                 uuid="yes"
+  ;;
+  --disable-vhdx) vhdx="no"
+  ;;
   *) echo "ERROR: unknown option $opt"; show_help="yes"
   ;;
   esac
@@ -1166,6 +1172,8 @@ echo "  --gcov=GCOV              use specified gcov [$gcov_tool]"
 echo "  --enable-tpm             enable TPM support"
 echo "  --disable-libssh2        disable ssh block device support"
 echo "  --enable-libssh2         enable ssh block device support"
+echo "  --disable-vhdx           disables support for the Microsoft VHDX image format"
+echo "  --enable-vhdx            enable support for the Microsoft VHDX image format"
 echo ""
 echo "NOTE: The object files are built at the place where configure is launched"
 exit 1
@@ -3622,6 +3630,7 @@ echo "TPM support       $tpm"
 echo "libssh2 support   $libssh2"
 echo "TPM passthrough   $tpm_passthrough"
 echo "QOM debugging     $qom_cast_debug"
+echo "vhdx              $vhdx"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -4012,6 +4021,10 @@ if test "$virtio_blk_data_plane" = "yes" ; then
   echo 'CONFIG_VIRTIO_BLK_DATA_PLANE=$(CONFIG_VIRTIO)' >> $config_host_mak
 fi
 
+if test "$vhdx" = "yes" ; then
+  echo "CONFIG_VHDX=y" >> $config_host_mak
+fi
+
 # USB host support
 case "$usb" in
 linux)
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 3/9] block: vhdx code movement - VHDXMetadataEntries and BDRVVHDXState to header.
  2013-07-24 17:54 [Qemu-devel] [PATCH 0/9] VHDX log replay and write support Jeff Cody
  2013-07-24 17:54 ` [Qemu-devel] [PATCH 1/9] block: vhdx - minor comments and typo correction Jeff Cody
  2013-07-24 17:54 ` [Qemu-devel] [PATCH 2/9] block: vhdx - add header update capability Jeff Cody
@ 2013-07-24 17:54 ` Jeff Cody
  2013-07-24 17:54 ` [Qemu-devel] [PATCH 4/9] block: vhdx - log support struct and defines Jeff Cody
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Jeff Cody @ 2013-07-24 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, stefanha

In preparation for VHDX log support, move these structures to the
header.

Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 block/vhdx.c | 51 ---------------------------------------------------
 block/vhdx.h | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 47 insertions(+), 51 deletions(-)

diff --git a/block/vhdx.c b/block/vhdx.c
index 13e486d..72af996 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -104,16 +104,6 @@ static const MSGUID parent_vhdx_guid = { .data1 = 0xb04aefb7,
      META_PAGE_83_PRESENT | META_LOGICAL_SECTOR_SIZE_PRESENT | \
      META_PHYS_SECTOR_SIZE_PRESENT)
 
-typedef struct VHDXMetadataEntries {
-    VHDXMetadataTableEntry file_parameters_entry;
-    VHDXMetadataTableEntry virtual_disk_size_entry;
-    VHDXMetadataTableEntry page83_data_entry;
-    VHDXMetadataTableEntry logical_sector_size_entry;
-    VHDXMetadataTableEntry phys_sector_size_entry;
-    VHDXMetadataTableEntry parent_locator_entry;
-    uint16_t present;
-} VHDXMetadataEntries;
-
 
 typedef struct VHDXSectorInfo {
     uint32_t bat_idx;       /* BAT entry index */
@@ -124,47 +114,6 @@ typedef struct VHDXSectorInfo {
     uint64_t block_offset;  /* block offset, in bytes */
 } VHDXSectorInfo;
 
-
-
-typedef struct BDRVVHDXState {
-    CoMutex lock;
-
-    int curr_header;
-    VHDXHeader *headers[2];
-
-    VHDXRegionTableHeader rt;
-    VHDXRegionTableEntry bat_rt;         /* region table for the BAT */
-    VHDXRegionTableEntry metadata_rt;    /* region table for the metadata */
-
-    VHDXMetadataTableHeader metadata_hdr;
-    VHDXMetadataEntries metadata_entries;
-
-    VHDXFileParameters params;
-    uint32_t block_size;
-    uint32_t block_size_bits;
-    uint32_t sectors_per_block;
-    uint32_t sectors_per_block_bits;
-
-    uint64_t virtual_disk_size;
-    uint32_t logical_sector_size;
-    uint32_t physical_sector_size;
-
-    uint64_t chunk_ratio;
-    uint32_t chunk_ratio_bits;
-    uint32_t logical_sector_size_bits;
-
-    uint32_t bat_entries;
-    VHDXBatEntry *bat;
-    uint64_t bat_offset;
-
-    MSGUID session_guid;
-
-
-    VHDXParentLocatorHeader parent_header;
-    VHDXParentLocatorEntry *parent_entries;
-
-} BDRVVHDXState;
-
 /* Calculates new checksum.
  *
  * Zero is substituted during crc calculation for the original crc field
diff --git a/block/vhdx.h b/block/vhdx.h
index 3999cb1..c8d8593 100644
--- a/block/vhdx.h
+++ b/block/vhdx.h
@@ -308,6 +308,53 @@ typedef struct QEMU_PACKED VHDXParentLocatorEntry {
 
 /* ----- END VHDX SPECIFICATION STRUCTURES ---- */
 
+typedef struct VHDXMetadataEntries {
+    VHDXMetadataTableEntry file_parameters_entry;
+    VHDXMetadataTableEntry virtual_disk_size_entry;
+    VHDXMetadataTableEntry page83_data_entry;
+    VHDXMetadataTableEntry logical_sector_size_entry;
+    VHDXMetadataTableEntry phys_sector_size_entry;
+    VHDXMetadataTableEntry parent_locator_entry;
+    uint16_t present;
+} VHDXMetadataEntries;
+
+typedef struct BDRVVHDXState {
+    CoMutex lock;
+
+    int curr_header;
+    VHDXHeader *headers[2];
+
+    VHDXRegionTableHeader rt;
+    VHDXRegionTableEntry bat_rt;         /* region table for the BAT */
+    VHDXRegionTableEntry metadata_rt;    /* region table for the metadata */
+
+    VHDXMetadataTableHeader metadata_hdr;
+    VHDXMetadataEntries metadata_entries;
+
+    VHDXFileParameters params;
+    uint32_t block_size;
+    uint32_t block_size_bits;
+    uint32_t sectors_per_block;
+    uint32_t sectors_per_block_bits;
+
+    uint64_t virtual_disk_size;
+    uint32_t logical_sector_size;
+    uint32_t physical_sector_size;
+
+    uint64_t chunk_ratio;
+    uint32_t chunk_ratio_bits;
+    uint32_t logical_sector_size_bits;
+
+    uint32_t bat_entries;
+    VHDXBatEntry *bat;
+    uint64_t bat_offset;
+
+    MSGUID session_guid;
+
+    VHDXParentLocatorHeader parent_header;
+    VHDXParentLocatorEntry *parent_entries;
+
+} BDRVVHDXState;
 
 void vhdx_guid_generate(MSGUID *guid);
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 4/9] block: vhdx - log support struct and defines
  2013-07-24 17:54 [Qemu-devel] [PATCH 0/9] VHDX log replay and write support Jeff Cody
                   ` (2 preceding siblings ...)
  2013-07-24 17:54 ` [Qemu-devel] [PATCH 3/9] block: vhdx code movement - VHDXMetadataEntries and BDRVVHDXState to header Jeff Cody
@ 2013-07-24 17:54 ` Jeff Cody
  2013-07-30  3:15   ` Fam Zheng
  2013-07-24 17:54 ` [Qemu-devel] [PATCH 5/9] block: vhdx - break endian translation functions out Jeff Cody
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: Jeff Cody @ 2013-07-24 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, stefanha

This adds some magic number defines, and internal structure
definitions for VHDX log replay support.

Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 block/vhdx.h | 21 ++++++++++++++++++++-
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/block/vhdx.h b/block/vhdx.h
index c8d8593..2db6615 100644
--- a/block/vhdx.h
+++ b/block/vhdx.h
@@ -151,7 +151,10 @@ typedef struct QEMU_PACKED VHDXRegionTableEntry {
 
 
 /* ---- LOG ENTRY STRUCTURES ---- */
+#define VHDX_LOG_MIN_SIZE (1024*1024)
+#define VHDX_LOG_SECTOR_SIZE 4096
 #define VHDX_LOG_HDR_SIZE 64
+#define VHDX_LOG_SIGNATURE 0x65676f6c
 typedef struct QEMU_PACKED VHDXLogEntryHeader {
     uint32_t    signature;              /* "loge" in ASCII */
     uint32_t    checksum;               /* CRC-32C hash of the 64KB table */
@@ -174,7 +177,8 @@ typedef struct QEMU_PACKED VHDXLogEntryHeader {
 } VHDXLogEntryHeader;
 
 #define VHDX_LOG_DESC_SIZE 32
-
+#define VHDX_LOG_DESC_SIGNATURE 0x63736564
+#define VHDX_LOG_ZERO_SIGNATURE 0x6f72657a
 typedef struct QEMU_PACKED VHDXLogDescriptor {
     uint32_t    signature;              /* "zero" or "desc" in ASCII */
     union  {
@@ -194,6 +198,7 @@ typedef struct QEMU_PACKED VHDXLogDescriptor {
                                            vhdx_log_entry_header */
 } VHDXLogDescriptor;
 
+#define VHDX_LOG_DATA_SIGNATURE 0x61746164
 typedef struct QEMU_PACKED VHDXLogDataSector {
     uint32_t    data_signature;         /* "data" in ASCII */
     uint32_t    sequence_high;          /* 4 MSB of 8 byte sequence_number */
@@ -318,6 +323,18 @@ typedef struct VHDXMetadataEntries {
     uint16_t present;
 } VHDXMetadataEntries;
 
+typedef struct VHDXLogEntries {
+    uint64_t offset;
+    uint64_t length;
+    uint32_t head;
+    uint32_t tail;
+} VHDXLogEntries;
+
+typedef struct VHDXLogEntryInfo {
+    uint64_t sector_start;
+    uint32_t desc_count;
+} VHDXLogEntryInfo;
+
 typedef struct BDRVVHDXState {
     CoMutex lock;
 
@@ -351,6 +368,8 @@ typedef struct BDRVVHDXState {
 
     MSGUID session_guid;
 
+    VHDXLogEntries log;
+
     VHDXParentLocatorHeader parent_header;
     VHDXParentLocatorEntry *parent_entries;
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 5/9] block: vhdx - break endian translation functions out
  2013-07-24 17:54 [Qemu-devel] [PATCH 0/9] VHDX log replay and write support Jeff Cody
                   ` (3 preceding siblings ...)
  2013-07-24 17:54 ` [Qemu-devel] [PATCH 4/9] block: vhdx - log support struct and defines Jeff Cody
@ 2013-07-24 17:54 ` Jeff Cody
  2013-07-24 17:54 ` [Qemu-devel] [PATCH 6/9] block: vhdx - update log guid in header, and first write tracker Jeff Cody
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Jeff Cody @ 2013-07-24 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, stefanha

This moves the endian translation functions out from the vhdx.c source,
into a separate source file. In addition to the previously defined
endian functions, new endian translation functions for log support are
added as well.

Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 block/Makefile.objs |   2 +-
 block/vhdx-endian.c | 141 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 block/vhdx.c        |  43 ----------------
 block/vhdx.h        |  13 +++++
 4 files changed, 155 insertions(+), 44 deletions(-)
 create mode 100644 block/vhdx-endian.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index e5e54e6..e6f5d33 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -2,7 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
 block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
-block-obj-$(CONFIG_VHDX) += vhdx.o
+block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o
 block-obj-y += parallels.o blkdebug.o blkverify.o
 block-obj-y += snapshot.o qapi.o
 block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
diff --git a/block/vhdx-endian.c b/block/vhdx-endian.c
new file mode 100644
index 0000000..f7a59c5
--- /dev/null
+++ b/block/vhdx-endian.c
@@ -0,0 +1,141 @@
+/*
+ * Block driver for Hyper-V VHDX Images
+ *
+ * Copyright (c) 2013 Red Hat, Inc.,
+ *
+ * Authors:
+ *  Jeff Cody <jcody@redhat.com>
+ *
+ *  This is based on the "VHDX Format Specification v1.00", published 8/25/2012
+ *  by Microsoft:
+ *      https://www.microsoft.com/en-us/download/details.aspx?id=34750
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "qemu-common.h"
+#include "block/block_int.h"
+#include "block/vhdx.h"
+
+#include <uuid/uuid.h>
+
+
+/*
+ * All the VHDX formats on disk are little endian - the following
+ * are helper import/export functions to correctly convert
+ * endianness from disk read to native cpu format, and back again.
+ */
+
+
+/* VHDX File Header */
+
+
+void vhdx_header_le_import(VHDXHeader *h)
+{
+    assert(h != NULL);
+
+    le32_to_cpus(&h->signature);
+    le32_to_cpus(&h->checksum);
+    le64_to_cpus(&h->sequence_number);
+
+    leguid_to_cpus(&h->file_write_guid);
+    leguid_to_cpus(&h->data_write_guid);
+    leguid_to_cpus(&h->log_guid);
+
+    le16_to_cpus(&h->log_version);
+    le16_to_cpus(&h->version);
+    le32_to_cpus(&h->log_length);
+    le64_to_cpus(&h->log_offset);
+}
+
+void vhdx_header_le_export(VHDXHeader *orig_h, VHDXHeader *new_h)
+{
+    assert(orig_h != NULL);
+    assert(new_h != NULL);
+
+    new_h->signature       = cpu_to_le32(orig_h->signature);
+    new_h->checksum        = cpu_to_le32(orig_h->checksum);
+    new_h->sequence_number = cpu_to_le64(orig_h->sequence_number);
+
+    memcpy(&new_h->file_write_guid, &orig_h->file_write_guid, sizeof(MSGUID));
+    memcpy(&new_h->data_write_guid, &orig_h->data_write_guid, sizeof(MSGUID));
+    memcpy(&new_h->log_guid,        &orig_h->log_guid,        sizeof(MSGUID));
+
+    cpu_to_leguids(&new_h->file_write_guid);
+    cpu_to_leguids(&new_h->data_write_guid);
+    cpu_to_leguids(&new_h->log_guid);
+
+    new_h->log_version     = cpu_to_le16(orig_h->log_version);
+    new_h->version         = cpu_to_le16(orig_h->version);
+    new_h->log_length      = cpu_to_le32(orig_h->log_length);
+    new_h->log_offset      = cpu_to_le64(orig_h->log_offset);
+}
+
+
+/* VHDX Log Headers */
+
+
+void vhdx_log_desc_le_import(VHDXLogDescriptor *d)
+{
+    assert(d != NULL);
+
+    le32_to_cpus(&d->signature);
+    le32_to_cpus(&d->trailing_bytes);
+    le64_to_cpus(&d->leading_bytes);
+    le64_to_cpus(&d->file_offset);
+    le64_to_cpus(&d->sequence_number);
+}
+
+void vhdx_log_desc_le_export(VHDXLogDescriptor *d)
+{
+    assert(d != NULL);
+
+    cpu_to_le32s(&d->signature);
+    cpu_to_le32s(&d->trailing_bytes);
+    cpu_to_le64s(&d->leading_bytes);
+    cpu_to_le64s(&d->file_offset);
+    cpu_to_le64s(&d->sequence_number);
+}
+
+void vhdx_log_data_le_export(VHDXLogDataSector *d)
+{
+    assert(d != NULL);
+
+    cpu_to_le32s(&d->data_signature);
+    cpu_to_le32s(&d->sequence_high);
+    cpu_to_le32s(&d->sequence_low);
+}
+
+void vhdx_log_entry_hdr_le_import(VHDXLogEntryHeader *hdr)
+{
+    assert(hdr != NULL);
+
+    le32_to_cpus(&hdr->signature);
+    le32_to_cpus(&hdr->checksum);
+    le32_to_cpus(&hdr->entry_length);
+    le32_to_cpus(&hdr->tail);
+    le64_to_cpus(&hdr->sequence_number);
+    le32_to_cpus(&hdr->descriptor_count);
+    leguid_to_cpus(&hdr->log_guid);
+    le64_to_cpus(&hdr->flushed_file_offset);
+    le64_to_cpus(&hdr->last_file_offset);
+}
+
+void vhdx_log_entry_hdr_le_export(VHDXLogEntryHeader *hdr)
+{
+    assert(hdr != NULL);
+
+    cpu_to_le32s(&hdr->signature);
+    cpu_to_le32s(&hdr->checksum);
+    cpu_to_le32s(&hdr->entry_length);
+    cpu_to_le32s(&hdr->tail);
+    cpu_to_le64s(&hdr->sequence_number);
+    cpu_to_le32s(&hdr->descriptor_count);
+    cpu_to_le64s(&hdr->flushed_file_offset);
+    cpu_to_le64s(&hdr->last_file_offset);
+    cpu_to_leguids(&hdr->log_guid);
+}
+
+
diff --git a/block/vhdx.c b/block/vhdx.c
index 72af996..9f7f04f 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -227,49 +227,6 @@ static int vhdx_probe(const uint8_t *buf, int buf_size, const char *filename)
     return 0;
 }
 
-/* All VHDX structures on disk are little endian */
-static void vhdx_header_le_import(VHDXHeader *h)
-{
-    assert(h != NULL);
-
-    le32_to_cpus(&h->signature);
-    le32_to_cpus(&h->checksum);
-    le64_to_cpus(&h->sequence_number);
-
-    leguid_to_cpus(&h->file_write_guid);
-    leguid_to_cpus(&h->data_write_guid);
-    leguid_to_cpus(&h->log_guid);
-
-    le16_to_cpus(&h->log_version);
-    le16_to_cpus(&h->version);
-    le32_to_cpus(&h->log_length);
-    le64_to_cpus(&h->log_offset);
-}
-
-/* All VHDX structures on disk are little endian */
-static void vhdx_header_le_export(VHDXHeader *orig_h, VHDXHeader *new_h)
-{
-    assert(orig_h != NULL);
-    assert(new_h != NULL);
-
-    new_h->signature       = cpu_to_le32(orig_h->signature);
-    new_h->checksum        = cpu_to_le32(orig_h->checksum);
-    new_h->sequence_number = cpu_to_le64(orig_h->sequence_number);
-
-    memcpy(&new_h->file_write_guid, &orig_h->file_write_guid, sizeof(MSGUID));
-    memcpy(&new_h->data_write_guid, &orig_h->data_write_guid, sizeof(MSGUID));
-    memcpy(&new_h->log_guid,        &orig_h->log_guid,        sizeof(MSGUID));
-
-    cpu_to_leguids(&new_h->file_write_guid);
-    cpu_to_leguids(&new_h->data_write_guid);
-    cpu_to_leguids(&new_h->log_guid);
-
-    new_h->log_version     = cpu_to_le16(orig_h->log_version);
-    new_h->version         = cpu_to_le16(orig_h->version);
-    new_h->log_length      = cpu_to_le32(orig_h->log_length);
-    new_h->log_offset      = cpu_to_le64(orig_h->log_offset);
-}
-
 /* Update the VHDX headers
  *
  * This follows the VHDX spec procedures for header updates.
diff --git a/block/vhdx.h b/block/vhdx.h
index 2db6615..5e0a1d3 100644
--- a/block/vhdx.h
+++ b/block/vhdx.h
@@ -398,4 +398,17 @@ static inline void cpu_to_leguids(MSGUID *guid)
     cpu_to_le16s(&guid->data3);
 }
 
+void vhdx_header_le_import(VHDXHeader *h);
+void vhdx_header_le_export(VHDXHeader *orig_h, VHDXHeader *new_h);
+void vhdx_log_desc_le_import(VHDXLogDescriptor *d);
+void vhdx_log_desc_le_export(VHDXLogDescriptor *d);
+void vhdx_log_data_le_export(VHDXLogDataSector *d);
+void vhdx_log_entry_hdr_le_import(VHDXLogEntryHeader *hdr);
+void vhdx_log_entry_hdr_le_export(VHDXLogEntryHeader *hdr);
+
+
+
+
+
+
 #endif
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 6/9] block: vhdx - update log guid in header, and first write tracker
  2013-07-24 17:54 [Qemu-devel] [PATCH 0/9] VHDX log replay and write support Jeff Cody
                   ` (4 preceding siblings ...)
  2013-07-24 17:54 ` [Qemu-devel] [PATCH 5/9] block: vhdx - break endian translation functions out Jeff Cody
@ 2013-07-24 17:54 ` Jeff Cody
  2013-07-24 17:54 ` [Qemu-devel] [PATCH 7/9] block: vhdx - log parsing, replay, and flush support Jeff Cody
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: Jeff Cody @ 2013-07-24 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, stefanha

Allow tracking of first file write in the VHDX image, as well as
the ability to update the GUID in the header.  This is in preparation
for log support.

Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 block/vhdx.c | 28 +++++++++++++++++++++++-----
 block/vhdx.h |  7 +++++--
 2 files changed, 28 insertions(+), 7 deletions(-)

diff --git a/block/vhdx.c b/block/vhdx.c
index 9f7f04f..f5689c3 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -233,7 +233,8 @@ static int vhdx_probe(const uint8_t *buf, int buf_size, const char *filename)
  *
  *  - non-current header is updated with largest sequence number
  */
-static int vhdx_update_header(BlockDriverState *bs, BDRVVHDXState *s, bool rw)
+static int vhdx_update_header(BlockDriverState *bs, BDRVVHDXState *s, bool rw,
+                              MSGUID *log_guid)
 {
     int ret = 0;
     int hdr_idx = 0;
@@ -266,6 +267,11 @@ static int vhdx_update_header(BlockDriverState *bs, BDRVVHDXState *s, bool rw)
         vhdx_guid_generate(&inactive_header->data_write_guid);
     }
 
+    /* update the log guid if present */
+    if (log_guid) {
+        memcpy(&inactive_header->log_guid, log_guid, sizeof(MSGUID));
+    }
+
     /* the header checksum is not over just the packed size of VHDXHeader,
      * but rather over the entire 'reserved' range for the header, which is
      * 4KB (VHDX_HEADER_SIZE). */
@@ -293,15 +299,16 @@ fail:
  * The VHDX spec calls for header updates to be performed twice, so that both
  * the current and non-current header have valid info
  */
-static int vhdx_update_headers(BlockDriverState *bs, BDRVVHDXState *s, bool rw)
+int vhdx_update_headers(BlockDriverState *bs, BDRVVHDXState *s, bool rw,
+                        MSGUID *log_guid)
 {
     int ret;
 
-    ret = vhdx_update_header(bs, s, rw);
+    ret = vhdx_update_header(bs, s, rw, log_guid);
     if (ret < 0) {
         return ret;
     }
-    ret = vhdx_update_header(bs, s, rw);
+    ret = vhdx_update_header(bs, s, rw, log_guid);
     return ret;
 }
 
@@ -781,6 +788,7 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, int flags)
 
 
     s->bat = NULL;
+    s->first_visible_write = true;
 
     qemu_co_mutex_init(&s->lock);
 
@@ -861,7 +869,7 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, int flags)
     }
 
     if (flags & BDRV_O_RDWR) {
-        vhdx_update_headers(bs, s, false);
+        vhdx_update_headers(bs, s, false, NULL);
     }
 
     /* TODO: differencing files, write */
@@ -998,6 +1006,16 @@ exit:
 
 
 
+/* Per the spec, on the first write of guest-visible data to the file the
+ * data write guid must be updated in the header */
+void vhdx_user_visible_write(BlockDriverState *bs, BDRVVHDXState *s)
+{
+    if (s->first_visible_write) {
+        s->first_visible_write = false;
+        vhdx_update_headers(bs, s, true, NULL);
+    }
+}
+
 static coroutine_fn int vhdx_co_writev(BlockDriverState *bs, int64_t sector_num,
                                       int nb_sectors, QEMUIOVector *qiov)
 {
diff --git a/block/vhdx.h b/block/vhdx.h
index 5e0a1d3..cb3ce0e 100644
--- a/block/vhdx.h
+++ b/block/vhdx.h
@@ -366,6 +366,7 @@ typedef struct BDRVVHDXState {
     VHDXBatEntry *bat;
     uint64_t bat_offset;
 
+    bool first_visible_write;
     MSGUID session_guid;
 
     VHDXLogEntries log;
@@ -377,6 +378,9 @@ typedef struct BDRVVHDXState {
 
 void vhdx_guid_generate(MSGUID *guid);
 
+int vhdx_update_headers(BlockDriverState *bs, BDRVVHDXState *s, bool rw,
+                        MSGUID *log_guid);
+
 uint32_t vhdx_update_checksum(uint8_t *buf, size_t size, int crc_offset);
 uint32_t vhdx_checksum_calc(uint32_t crc, uint8_t *buf, size_t size,
                             int crc_offset);
@@ -406,8 +410,7 @@ void vhdx_log_data_le_export(VHDXLogDataSector *d);
 void vhdx_log_entry_hdr_le_import(VHDXLogEntryHeader *hdr);
 void vhdx_log_entry_hdr_le_export(VHDXLogEntryHeader *hdr);
 
-
-
+void vhdx_user_visible_write(BlockDriverState *bs, BDRVVHDXState *s);
 
 
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 7/9] block: vhdx - log parsing, replay, and flush support
  2013-07-24 17:54 [Qemu-devel] [PATCH 0/9] VHDX log replay and write support Jeff Cody
                   ` (5 preceding siblings ...)
  2013-07-24 17:54 ` [Qemu-devel] [PATCH 6/9] block: vhdx - update log guid in header, and first write tracker Jeff Cody
@ 2013-07-24 17:54 ` Jeff Cody
  2013-07-30  3:48   ` Fam Zheng
  2013-07-24 17:54 ` [Qemu-devel] [PATCH 8/9] block: vhdx - add log write support Jeff Cody
  2013-07-24 17:54 ` [Qemu-devel] [PATCH 9/9] block: vhdx " Jeff Cody
  8 siblings, 1 reply; 19+ messages in thread
From: Jeff Cody @ 2013-07-24 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, stefanha

This adds support for VHDX v0 logs, as specified in Microsoft's
VHDX Specification Format v1.00:
https://www.microsoft.com/en-us/download/details.aspx?id=34750

The following support is added:

* Log parsing, and validation - validate that an existing log
  is correct.

* Log search - search through an existing log, to find any valid
  sequence of entries.

* Log replay and flush - replay an existing log, and flush/clear
  the log when complete.

The VHDX log is a circular buffer, with elements (sectors) of 4KB.

A log entry is a variably-length number of sectors, that is
comprised of a header and 'descriptors', that describe each sector.

A log may contain multiple entries, know as a log sequence.  In a log
sequence, each log entry immediately follows the previous entry, with an
incrementing sequence number.  There can only ever be one active and
valid sequence in the log.

Each log entry must match the file log GUID in order to be valid (along
with other criteria).  Once we have flushed all valid log entries, we
marked the file log GUID to be zero, which indicates a buffer with no
valid entries.

Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 block/Makefile.objs |   2 +-
 block/vhdx-log.c    | 734 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 block/vhdx.c        |  44 +---
 block/vhdx.h        |   7 +-
 4 files changed, 743 insertions(+), 44 deletions(-)
 create mode 100644 block/vhdx-log.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index e6f5d33..2fbd79a 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -2,7 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
 block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
-block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o
+block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
 block-obj-y += parallels.o blkdebug.o blkverify.o
 block-obj-y += snapshot.o qapi.o
 block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
diff --git a/block/vhdx-log.c b/block/vhdx-log.c
new file mode 100644
index 0000000..89b9000
--- /dev/null
+++ b/block/vhdx-log.c
@@ -0,0 +1,734 @@
+/*
+ * Block driver for Hyper-V VHDX Images
+ *
+ * Copyright (c) 2013 Red Hat, Inc.,
+ *
+ * Authors:
+ *  Jeff Cody <jcody@redhat.com>
+ *
+ *  This is based on the "VHDX Format Specification v1.00", published 8/25/2012
+ *  by Microsoft:
+ *      https://www.microsoft.com/en-us/download/details.aspx?id=34750
+ *
+ * This file covers the functionality of the metadata log writing, parsing, and
+ * replay.
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+#include "qemu-common.h"
+#include "block/block_int.h"
+#include "qemu/module.h"
+#include "block/vhdx.h"
+
+
+typedef struct VHDXLogSequence {
+    bool valid;
+    uint32_t count;
+    VHDXLogEntries log;
+    VHDXLogEntryHeader hdr;
+} VHDXLogSequence;
+
+typedef struct VHDXLogDescEntries {
+    VHDXLogEntryHeader hdr;
+    VHDXLogDescriptor desc[];
+} VHDXLogDescEntries;
+
+
+/* Returns true if the GUID is zero */
+static bool vhdx_log_guid_is_zero(MSGUID *guid)
+{
+    int i;
+    int ret = 0;
+
+    /* If either the log guid, or log length is zero,
+     * then a replay log is not present */
+    for (i = 0; i < sizeof(MSGUID); i++) {
+        ret |= ((uint8_t *) guid)[i];
+    }
+
+    return ret == 0;
+}
+
+/* The log located on the disk is circular buffer containing
+ * sectors of 4096 bytes each.
+ *
+ * It is assumed for the read/write functions below that the
+ * circular buffer scheme uses a 'one sector open' to indicate
+ * the buffer is full.  Given the validation methods used for each
+ * sector, this method should be compatible with other methods that
+ * do not waste a sector.
+ */
+
+
+/* Allow peeking at the hdr entry at the beginning of the current
+ * read index, without advancing the read index */
+static int vhdx_log_peek_hdr(BlockDriverState *bs, VHDXLogEntries *log,
+                             VHDXLogEntryHeader *hdr)
+{
+    int ret = 0;
+    uint64_t offset;
+    uint32_t read;
+
+    assert(hdr != NULL);
+
+    /* peek is only support on sector boundaries */
+    if (log->read % VHDX_LOG_SECTOR_SIZE) {
+        ret = -EFAULT;
+        goto exit;
+    }
+
+    read = log->read;
+    /* we are guaranteed that a) log sectors are 4096 bytes,
+     * and b) the log length is a multiple of 1MB. So, there
+     * is always a round number of sectors in the buffer */
+    if ((read + sizeof(VHDXLogEntryHeader)) > log->length) {
+        read = 0;
+    }
+
+    if (read == log->write) {
+        ret = -EINVAL;
+        goto exit;
+    }
+
+    offset = log->offset + read;
+
+    ret = bdrv_pread(bs->file, offset, hdr, sizeof(VHDXLogEntryHeader));
+    if (ret < 0) {
+        goto exit;
+    }
+
+exit:
+    return ret;
+}
+
+/* Index increment for log, based on sector boundaries */
+static int vhdx_log_inc_idx(uint32_t idx, uint64_t length)
+{
+    idx += VHDX_LOG_SECTOR_SIZE;
+    /* we are guaranteed that a) log sectors are 4096 bytes,
+     * and b) the log length is a multiple of 1MB. So, there
+     * is always a round number of sectors in the buffer */
+    return idx >= length ? 0 : idx;
+}
+
+
+/* Reset the log to empty */
+static void vhdx_log_reset(BlockDriverState *bs, BDRVVHDXState *s)
+{
+    MSGUID guid = { 0 };
+    s->log.read = s->log.write = 0;
+    /* a log guid of 0 indicates an empty log to any parser of v0
+     * VHDX logs */
+    vhdx_update_headers(bs, s, false, &guid);
+}
+
+/* Reads num_sectors from the log (all log sectors are 4096 bytes),
+ * into buffer 'buffer'.  Upon return, *sectors_read will contain
+ * the number of sectors successfully read.
+ *
+ * It is assumed that 'buffer' is already allocated, and of sufficient
+ * size (i.e. >= 4096*num_sectors).
+ *
+ * If 'peek' is true, then the tail (read) pointer for the circular buffer is
+ * not modified.
+ *
+ * 0 is returned on success, -errno otherwise.  */
+static int vhdx_log_read_sectors(BlockDriverState *bs, VHDXLogEntries *log,
+                                 uint32_t *sectors_read, void *buffer,
+                                 uint32_t num_sectors, bool peek)
+{
+    int ret = 0;
+    uint64_t offset;
+    uint32_t read;
+
+    read = log->read;
+
+    *sectors_read = 0;
+    while (num_sectors) {
+        if (read == log->write) {
+            /* empty */
+            break;
+        }
+        offset = log->offset + read;
+
+        ret = bdrv_pread(bs->file, offset, buffer, VHDX_LOG_SECTOR_SIZE);
+        if (ret < 0) {
+            goto exit;
+        }
+        read = vhdx_log_inc_idx(read, log->length);
+
+        *sectors_read = *sectors_read + 1;
+        num_sectors--;
+    }
+
+exit:
+    if (!peek) {
+        log->read = read;
+    }
+    return ret;
+}
+
+/* Validates a log entry header */
+static bool vhdx_log_hdr_is_valid(VHDXLogEntries *log, VHDXLogEntryHeader *hdr,
+                                  BDRVVHDXState *s)
+{
+    int valid = false;
+
+    if (memcmp(&hdr->signature, "loge", 4)) {
+        goto exit;
+    }
+
+    /* if the individual entry length is larger than the whole log
+     * buffer, that is obviously invalid */
+    if (log->length < hdr->entry_length) {
+        goto exit;
+    }
+
+    /* length of entire entry must be in units of 4KB (log sector size) */
+    if (hdr->entry_length % (VHDX_LOG_SECTOR_SIZE)) {
+        goto exit;
+    }
+
+    /* per spec, sequence # must be > 0 */
+    if (hdr->sequence_number == 0) {
+        goto exit;
+    }
+
+    /* log entries are only valid if they match the file-wide log guid
+     * found in the active header */
+    if (!guid_eq(hdr->log_guid, s->headers[s->curr_header]->log_guid)) {
+        goto exit;
+    }
+
+    valid = true;
+
+exit:
+    return valid;
+}
+
+/*
+ * Given a log header, this will validate that the descriptors and the
+ * corresponding data sectors (if applicable)
+ *
+ * Validation consists of:
+ *      1. Making sure the sequence numbers matches the entry header
+ *      2. Verifying a valid signature ('zero' or desc' for descriptors)
+ *      3. File offset field is a multiple of 4KB
+ *      4. If a data descriptor, the corresponding data sector
+ *         has its signature ('data') and matching sequence number
+ *
+ * 'desc' is the data buffer containing the descriptor
+ * hdr is the log entry header
+ *
+ * Returns true if valid
+ */
+static bool vhdx_log_desc_is_valid(VHDXLogDescriptor *desc,
+                                   VHDXLogEntryHeader *hdr)
+{
+    bool ret = false;
+
+    if (desc->sequence_number != hdr->sequence_number) {
+        goto exit;
+    }
+    if (desc->file_offset % VHDX_LOG_SECTOR_SIZE) {
+        goto exit;
+    }
+
+    if (!memcmp(&desc->signature, "zero", 4)) {
+        if (!desc->zero_length % VHDX_LOG_SECTOR_SIZE) {
+            /* valid */
+            ret = true;
+        }
+    } else if (!memcmp(&desc->signature, "desc", 4)) {
+            /* valid */
+            ret = true;
+    }
+
+exit:
+    return ret;
+}
+
+
+/* Prior to sector data for a log entry, there is the header
+ * and the descriptors referenced in the header:
+ *
+ * [] = 4KB sector
+ *
+ * [ hdr, desc ][   desc   ][ ... ][ data ][ ... ]
+ *
+ * The first sector in a log entry has a 64 byte header, and
+ * up to 126 32-byte descriptors.  If more descriptors than
+ * 126 are required, then subsequent sectors can have up to 128
+ * descriptors.  Each sector is 4KB.  Data follows the descriptor
+ * sectors.
+ *
+ * This will return the number of sectors needed to encompass
+ * the passed number of descriptors in desc_cnt.
+ *
+ * This will never return 0, even if desc_cnt is 0.
+ */
+static int vhdx_compute_desc_sectors(uint32_t desc_cnt)
+{
+    uint32_t desc_sectors;
+
+    desc_cnt += 2; /* account for header in first sector */
+    desc_sectors = desc_cnt / 128;
+    if (desc_cnt % 128) {
+        desc_sectors++;
+    }
+
+    return desc_sectors;
+}
+
+
+/* Reads the log header, and subsequent descriptors (if any).  This
+ * will allocate all the space for buffer, which must be NULL when
+ * passed into this function. Each descriptor will also be validated,
+ * and error returned if any are invalid. */
+static int vhdx_log_read_desc(BlockDriverState *bs, BDRVVHDXState *s,
+                              VHDXLogEntries *log, VHDXLogDescEntries **buffer)
+{
+    int ret = 0;
+    uint32_t desc_sectors;
+    uint32_t sectors_read;
+    VHDXLogEntryHeader hdr;
+    VHDXLogDescEntries *desc_entries = NULL;
+    int i;
+
+    assert(*buffer == NULL);
+
+    ret = vhdx_log_peek_hdr(bs, log, &hdr);
+    if (ret < 0) {
+        goto exit;
+    }
+    vhdx_log_entry_hdr_le_import(&hdr);
+    if (vhdx_log_hdr_is_valid(log, &hdr, s) == false) {
+        ret = -EINVAL;
+        goto exit;
+    }
+
+    desc_sectors = vhdx_compute_desc_sectors(hdr.descriptor_count);
+    desc_entries = qemu_blockalign(bs, desc_sectors * VHDX_LOG_SECTOR_SIZE);
+
+    ret = vhdx_log_read_sectors(bs, log, &sectors_read, desc_entries,
+                                desc_sectors, false);
+    if (ret < 0) {
+        goto free_and_exit;
+    }
+    if (sectors_read != desc_sectors) {
+        ret = -EINVAL;
+        goto free_and_exit;
+    }
+
+    /* put in proper endianness, and validate each desc */
+    for (i = 0; i < hdr.descriptor_count; i++) {
+        vhdx_log_desc_le_import(&desc_entries->desc[i]);
+        if (vhdx_log_desc_is_valid(&desc_entries->desc[i], &hdr) == false) {
+            ret = -EINVAL;
+            goto free_and_exit;
+        }
+    }
+
+    *buffer = desc_entries;
+    goto exit;
+
+free_and_exit:
+    qemu_vfree(desc_entries);
+exit:
+    return ret;
+}
+
+
+/* Flushes the descriptor described by desc to the VHDX image file.
+ * If the descriptor is a data descriptor, than 'data' must be non-NULL,
+ * and >= 4096 bytes (VHDX_LOG_SECTOR_SIZE), containing the data to be
+ * written.
+ *
+ * Verification is performed to make sure the sequence numbers of a data
+ * descriptor match the sequence number in the desc.
+ *
+ * For a zero descriptor, it may describe multiple sectors to fill with zeroes.
+ * In this case, it should be noted that zeroes are written to disk, and the
+ * image file is not extended as a sparse file.  */
+static int vhdx_log_flush_desc(BlockDriverState *bs, VHDXLogDescriptor *desc,
+                               VHDXLogDataSector *data)
+{
+    int ret = 0;
+    uint64_t seq, file_offset;
+    uint32_t offset = 0;
+    void *buffer = NULL;
+    uint64_t count = 1;
+    int i;
+
+    buffer = qemu_blockalign(bs, VHDX_LOG_SECTOR_SIZE);
+
+    if (!memcmp(&desc->signature, "desc", 4)) {
+        /* data sector */
+        if (data == NULL) {
+            ret = -EFAULT;
+            goto exit;
+        }
+
+        /* The sequence number of the data sector must match that
+         * in the descriptor */
+        seq = data->sequence_high;
+        seq <<= 32;
+        seq |= data->sequence_low & 0xffffffff;
+
+        if (seq != desc->sequence_number) {
+            ret = -EINVAL;
+            goto exit;
+        }
+
+        /* Each data sector is in total 4096 bytes, however the first
+         * 8 bytes, and last 4 bytes, are located in the descriptor */
+        memcpy(buffer, &desc->leading_bytes, sizeof(desc->leading_bytes));
+        offset += sizeof(desc->leading_bytes);
+
+        memcpy(buffer+offset, data->data, 4084);
+        offset += 4084;
+
+        memcpy(buffer+offset, &desc->trailing_bytes,
+               sizeof(desc->trailing_bytes));
+
+    } else if (!memcmp(&desc->signature, "zero", 4)) {
+        /* write 'count' sectors of sector */
+        memset(buffer, 0, VHDX_LOG_SECTOR_SIZE);
+        count = desc->zero_length / VHDX_LOG_SECTOR_SIZE;
+    }
+
+    file_offset = desc->file_offset;
+
+    /* count is only > 1 if we are writing zeroes */
+    for (i = 0; i < count; i++) {
+        ret = bdrv_pwrite_sync(bs->file, file_offset, buffer,
+                               VHDX_LOG_SECTOR_SIZE);
+        if (ret < 0) {
+            goto exit;
+        }
+        file_offset += VHDX_LOG_SECTOR_SIZE;
+    }
+
+exit:
+    qemu_vfree(buffer);
+    return ret;
+}
+
+/* Flush the entire log (as described by 'logs') to the VHDX image
+ * file, and then set the log to 'empty' status once complete.
+ *
+ * The log entries should be validate prior to flushing */
+static int vhdx_log_flush(BlockDriverState *bs, BDRVVHDXState *s,
+                          VHDXLogSequence *logs)
+{
+    int ret = 0;
+    int i;
+    uint32_t cnt, sectors_read;
+    uint64_t new_file_size;
+    void *data = NULL;
+    VHDXLogDescEntries *desc_entries = NULL;
+    VHDXLogEntryHeader hdr_tmp = { 0 };
+
+    cnt = logs->count;
+
+    data = qemu_blockalign(bs, VHDX_LOG_SECTOR_SIZE);
+
+    vhdx_user_visible_write(bs, s);
+
+    /* each iteration represents one log sequence, which may span multiple
+     * sectors */
+    while (cnt--) {
+        ret = vhdx_log_peek_hdr(bs, &logs->log, &hdr_tmp);
+        if (ret < 0) {
+            goto exit;
+        }
+        /* if the log shows a FlushedFileOffset larger than our current file
+         * size, then that means the file has been truncated / corrupted, and
+         * we must refused to open it / use it */
+        if (hdr_tmp.flushed_file_offset > bdrv_getlength(bs->file)) {
+            ret = -EINVAL;
+            goto exit;
+        }
+
+        ret = vhdx_log_read_desc(bs, s, &logs->log, &desc_entries);
+        if (ret < 0) {
+            goto exit;
+        }
+
+        for (i = 0; i < desc_entries->hdr.descriptor_count; i++) {
+            if (!memcmp(&desc_entries->desc[i].signature, "desc", 4)) {
+                /* data sector, so read a sector to flush */
+                ret = vhdx_log_read_sectors(bs, &logs->log, &sectors_read,
+                                            data, 1, false);
+                if (ret < 0) {
+                    goto exit;
+                }
+                if (sectors_read != 1) {
+                    ret = -EINVAL;
+                    goto exit;
+                }
+            }
+
+            ret = vhdx_log_flush_desc(bs, &desc_entries->desc[i], data);
+            if (ret < 0) {
+                goto exit;
+            }
+        }
+        if (bdrv_getlength(bs->file) < desc_entries->hdr.last_file_offset) {
+            new_file_size = desc_entries->hdr.last_file_offset;
+            if (new_file_size % (1024*1024)) {
+                /* round up to nearest 1MB boundary */
+                new_file_size = ((new_file_size >> 20) + 1) << 20;
+                bdrv_truncate(bs->file, new_file_size);
+            }
+        }
+        qemu_vfree(desc_entries);
+        desc_entries = NULL;
+    }
+
+    /* once the log is fully flushed, indicate that we have an empty log
+     * now.  This also sets the log guid to 0, to indicate an empty log */
+    vhdx_log_reset(bs, s);
+
+exit:
+    qemu_vfree(data);
+    qemu_vfree(desc_entries);
+    return ret;
+}
+
+static int vhdx_validate_log_entry(BlockDriverState *bs, BDRVVHDXState *s,
+                                   VHDXLogEntries *log, uint64_t seq,
+                                   bool *valid, VHDXLogEntryHeader *entry)
+{
+    int ret = 0;
+    VHDXLogEntryHeader hdr;
+    void *buffer = NULL;
+    uint32_t i, desc_sectors, total_sectors, crc;
+    uint32_t sectors_read = 0;
+    VHDXLogDescEntries *desc_buffer = NULL;
+
+    *valid = false;
+
+    ret = vhdx_log_peek_hdr(bs, log, &hdr);
+    if (ret < 0) {
+        goto inc_and_exit;
+    }
+
+    vhdx_log_entry_hdr_le_import(&hdr);
+
+
+    if (vhdx_log_hdr_is_valid(log, &hdr, s) == false) {
+        goto inc_and_exit;
+    }
+
+    if (seq > 0) {
+        if (hdr.sequence_number != seq + 1) {
+            goto inc_and_exit;
+        }
+    }
+
+    desc_sectors = vhdx_compute_desc_sectors(hdr.descriptor_count);
+
+    /* Read desc sectors, and calculate log checksum */
+
+    total_sectors = hdr.entry_length / VHDX_LOG_SECTOR_SIZE;
+
+
+    /* read_desc() will incrememnt the read idx */
+    ret = vhdx_log_read_desc(bs, s, log, &desc_buffer);
+    if (ret < 0) {
+        goto free_and_exit;
+    }
+
+    crc = vhdx_checksum_calc(0xffffffff, (void *)desc_buffer,
+                            desc_sectors * VHDX_LOG_SECTOR_SIZE, 4);
+    crc ^= 0xffffffff;
+
+    buffer = qemu_blockalign(bs, VHDX_LOG_SECTOR_SIZE);
+    if (total_sectors > desc_sectors) {
+        for (i = 0; i < total_sectors - desc_sectors; i++) {
+            sectors_read = 0;
+            ret = vhdx_log_read_sectors(bs, log, &sectors_read, buffer,
+                                        1, false);
+            if (ret < 0 || sectors_read != 1) {
+                goto free_and_exit;
+            }
+            crc = vhdx_checksum_calc(crc, buffer, VHDX_LOG_SECTOR_SIZE, -1);
+            crc ^= 0xffffffff;
+        }
+    }
+    crc ^= 0xffffffff;
+    if (crc != desc_buffer->hdr.checksum) {
+        goto free_and_exit;
+    }
+
+    *valid = true;
+    *entry = hdr;
+    goto free_and_exit;
+
+inc_and_exit:
+    log->read = vhdx_log_inc_idx(log->read, log->length);
+
+free_and_exit:
+    qemu_vfree(buffer);
+    qemu_vfree(desc_buffer);
+    return ret;
+}
+
+/* Search through the log circular buffer, and find the valid, active
+ * log sequence, if any exists
+ * */
+static int vhdx_log_search(BlockDriverState *bs, BDRVVHDXState *s,
+                           VHDXLogSequence *logs)
+{
+    int ret = 0;
+
+    uint64_t curr_seq = 0;
+    VHDXLogSequence candidate = { 0 };
+    VHDXLogSequence current = { 0 };
+
+    uint32_t tail;
+    bool seq_valid = false;
+    VHDXLogEntryHeader hdr = { 0 };
+    VHDXLogEntries curr_log;
+
+    memcpy(&curr_log, &s->log, sizeof(VHDXLogEntries));
+    curr_log.write = curr_log.length;   /* assume log is full */
+    curr_log.read = 0;
+
+
+    /* now we will go through the whole log sector by sector, until
+     * we find a valid, active log sequence, or reach the end of the
+     * log buffer */
+    for (;;) {
+        tail = curr_log.read;
+
+        curr_seq = 0;
+        memset(&current, 0, sizeof(current));
+
+        ret = vhdx_validate_log_entry(bs, s, &curr_log, curr_seq,
+                                      &seq_valid, &hdr);
+        if (ret < 0) {
+            goto exit;
+        }
+
+        if (seq_valid) {
+            current.valid     = true;
+            current.log       = curr_log;
+            current.log.read  = tail;
+            current.log.write = curr_log.read;
+            current.count     = 1;
+            current.hdr       = hdr;
+
+
+            for (;;) {
+                ret = vhdx_validate_log_entry(bs, s, &curr_log, curr_seq,
+                                              &seq_valid, &hdr);
+                if (ret < 0) {
+                    goto exit;
+                }
+                if (seq_valid == false) {
+                    break;
+                }
+                current.log.write = curr_log.read;
+                current.count++;
+
+                curr_seq = hdr.sequence_number;
+            }
+        }
+
+        if (current.valid) {
+            if (candidate.valid == false ||
+                current.hdr.sequence_number > candidate.hdr.sequence_number) {
+                candidate = current;
+            }
+        }
+
+        if (curr_log.read < tail) {
+            break;
+        }
+    }
+
+    *logs = candidate;
+
+    if (candidate.valid) {
+        /* this is the next sequence number, for writes */
+        s->log.sequence = candidate.hdr.sequence_number + 1;
+    }
+
+
+exit:
+    return ret;
+}
+
+/* Parse the replay log.  Per the VHDX spec, if the log is present
+ * it must be replayed prior to opening the file, even read-only.
+ *
+ * If read-only, we must replay the log in RAM (or refuse to open
+ * a dirty VHDX file read-only */
+int vhdx_parse_log(BlockDriverState *bs, BDRVVHDXState *s)
+{
+    int ret = 0;
+    VHDXHeader *hdr;
+    VHDXLogSequence logs = { 0 };
+
+    hdr = s->headers[s->curr_header];
+
+    /* s->log.hdr is freed in vhdx_close() */
+    if (s->log.hdr == NULL) {
+        s->log.hdr = qemu_blockalign(bs, sizeof(VHDXLogEntryHeader));
+    }
+
+    s->log.offset = hdr->log_offset;
+    s->log.length = hdr->log_length;
+
+    if (s->log.offset < VHDX_LOG_MIN_SIZE ||
+        s->log.offset % VHDX_LOG_MIN_SIZE) {
+        ret = -EINVAL;
+        goto exit;
+    }
+
+    /* per spec, only log version of 0 is supported */
+    if (hdr->log_version != 0) {
+        ret = -EINVAL;
+        goto exit;
+    }
+
+    /* If either the log guid, or log length is zero,
+     * then a replay log is not present */
+    if (vhdx_log_guid_is_zero(&hdr->log_guid)) {
+        goto exit;
+    }
+
+
+
+    if (hdr->log_length == 0) {
+        goto exit;
+    }
+
+    if (hdr->log_length % VHDX_LOG_MIN_SIZE) {
+        ret = -EINVAL;
+        goto exit;
+    }
+
+
+    /* The log is present, we need to find if and where there is an active
+     * sequence of valid entries present in the log.  */
+
+    ret = vhdx_log_search(bs, s, &logs);
+    if (ret < 0) {
+        goto exit;
+    }
+
+    if (logs.valid) {
+        /* now flush the log */
+        ret = vhdx_log_flush(bs, s, &logs);
+    }
+
+
+exit:
+    return ret;
+}
+
diff --git a/block/vhdx.c b/block/vhdx.c
index f5689c3..a8dd6d7 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -735,48 +735,6 @@ exit:
     return ret;
 }
 
-/* Parse the replay log.  Per the VHDX spec, if the log is present
- * it must be replayed prior to opening the file, even read-only.
- *
- * If read-only, we must replay the log in RAM (or refuse to open
- * a dirty VHDX file read-only */
-static int vhdx_parse_log(BlockDriverState *bs, BDRVVHDXState *s)
-{
-    int ret = 0;
-    int i;
-    VHDXHeader *hdr;
-
-    hdr = s->headers[s->curr_header];
-
-    /* either the log guid, or log length is zero,
-     * then a replay log is present */
-    for (i = 0; i < sizeof(hdr->log_guid.data4); i++) {
-        ret |= hdr->log_guid.data4[i];
-    }
-    if (hdr->log_guid.data1 == 0 &&
-        hdr->log_guid.data2 == 0 &&
-        hdr->log_guid.data3 == 0 &&
-        ret == 0) {
-        goto exit;
-    }
-
-    /* per spec, only log version of 0 is supported */
-    if (hdr->log_version != 0) {
-        ret = -EINVAL;
-        goto exit;
-    }
-
-    if (hdr->log_length == 0) {
-        goto exit;
-    }
-
-    /* We currently do not support images with logs to replay */
-    ret = -ENOTSUP;
-
-exit:
-    return ret;
-}
-
 
 static int vhdx_open(BlockDriverState *bs, QDict *options, int flags)
 {
@@ -789,6 +747,7 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, int flags)
 
     s->bat = NULL;
     s->first_visible_write = true;
+    s->log.write = s->log.read = 0;
 
     qemu_co_mutex_init(&s->lock);
 
@@ -1030,6 +989,7 @@ static void vhdx_close(BlockDriverState *bs)
     qemu_vfree(s->headers[1]);
     qemu_vfree(s->bat);
     qemu_vfree(s->parent_entries);
+    qemu_vfree(s->log.hdr);
 }
 
 static BlockDriver bdrv_vhdx = {
diff --git a/block/vhdx.h b/block/vhdx.h
index cb3ce0e..24b126e 100644
--- a/block/vhdx.h
+++ b/block/vhdx.h
@@ -326,7 +326,11 @@ typedef struct VHDXMetadataEntries {
 typedef struct VHDXLogEntries {
     uint64_t offset;
     uint64_t length;
-    uint32_t head;
+    uint32_t write;
+    uint32_t read;
+    VHDXLogEntryHeader *hdr;
+    void *desc_buffer;
+    uint64_t sequence;
     uint32_t tail;
 } VHDXLogEntries;
 
@@ -387,6 +391,7 @@ uint32_t vhdx_checksum_calc(uint32_t crc, uint8_t *buf, size_t size,
 
 bool vhdx_checksum_is_valid(uint8_t *buf, size_t size, int crc_offset);
 
+int vhdx_parse_log(BlockDriverState *bs, BDRVVHDXState *s);
 
 static inline void leguid_to_cpus(MSGUID *guid)
 {
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 8/9] block: vhdx - add log write support
  2013-07-24 17:54 [Qemu-devel] [PATCH 0/9] VHDX log replay and write support Jeff Cody
                   ` (6 preceding siblings ...)
  2013-07-24 17:54 ` [Qemu-devel] [PATCH 7/9] block: vhdx - log parsing, replay, and flush support Jeff Cody
@ 2013-07-24 17:54 ` Jeff Cody
  2013-07-30  3:57   ` Fam Zheng
  2013-07-24 17:54 ` [Qemu-devel] [PATCH 9/9] block: vhdx " Jeff Cody
  8 siblings, 1 reply; 19+ messages in thread
From: Jeff Cody @ 2013-07-24 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, stefanha

This adds support for writing to the VHDX log.

For spec details, see VHDX Specification Format v1.00:
https://www.microsoft.com/en-us/download/details.aspx?id=34750

There are a few limitations to this log support:
1.) There is no caching yet
2.) The log is flushed after each entry

The primary write interface, vhdx_log_write_and_flush(), performs a log
write followed by an immediate flush of the log.

As each log entry sector is a minimum of 4KB, partial sector writes are
filled in with data from the disk write destination.

If the current file log GUID is 0, a new GUID is generated and updated
in the header.

Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 block/vhdx-log.c | 273 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 block/vhdx.h     |   3 +
 2 files changed, 276 insertions(+)

diff --git a/block/vhdx-log.c b/block/vhdx-log.c
index 89b9000..786b393 100644
--- a/block/vhdx-log.c
+++ b/block/vhdx-log.c
@@ -170,6 +170,53 @@ exit:
     return ret;
 }
 
+/* Writes num_sectors to the log (all log sectors are 4096 bytes),
+ * from buffer 'buffer'.  Upon return, *sectors_written will contain
+ * the number of sectors successfully written.
+ *
+ * It is assumed that 'buffer' is at least 4096*num_sectors large.
+ *
+ * 0 is returned on success, -errno otherwise */
+static int vhdx_log_write_sectors(BlockDriverState *bs, VHDXLogEntries *log,
+                                  uint32_t *sectors_written, void *buffer,
+                                  uint32_t num_sectors)
+{
+    int ret = 0;
+    uint64_t offset;
+    uint32_t write;
+    void *buffer_tmp;
+    BDRVVHDXState *s = bs->opaque;
+
+    vhdx_user_visible_write(bs, s);
+
+    write = log->write;
+
+    buffer_tmp = buffer;
+    while (num_sectors) {
+
+        offset = log->offset + write;
+        write = vhdx_log_inc_idx(write, log->length);
+        if (write == log->read) {
+            /* full */
+            break;
+        }
+        ret = bdrv_pwrite_sync(bs->file, offset, buffer_tmp,
+                               VHDX_LOG_SECTOR_SIZE);
+        if (ret < 0) {
+            goto exit;
+        }
+        buffer_tmp += VHDX_LOG_SECTOR_SIZE;
+
+        log->write = write;
+        *sectors_written = *sectors_written + 1;
+        num_sectors--;
+    }
+
+exit:
+    return ret;
+}
+
+
 /* Validates a log entry header */
 static bool vhdx_log_hdr_is_valid(VHDXLogEntries *log, VHDXLogEntryHeader *hdr,
                                   BDRVVHDXState *s)
@@ -732,3 +779,229 @@ exit:
     return ret;
 }
 
+
+
+static void vhdx_log_raw_to_le_sector(VHDXLogDescriptor *desc,
+                                      VHDXLogDataSector *sector, void *data,
+                                      uint64_t seq)
+{
+    memcpy(&desc->leading_bytes, data, 8);
+    data += 8;
+    cpu_to_le64s(&desc->leading_bytes);
+    memcpy(sector->data, data, 4084);
+    data += 4084;
+    memcpy(&desc->trailing_bytes, data, 4);
+    cpu_to_le32s(&desc->trailing_bytes);
+    data += 4;
+
+    sector->sequence_high  = (uint32_t) (seq >> 32);
+    sector->sequence_low   = (uint32_t) (seq & 0xffffffff);
+    sector->data_signature = VHDX_LOG_DATA_SIGNATURE;
+
+    vhdx_log_desc_le_export(desc);
+    vhdx_log_data_le_export(sector);
+}
+
+
+static int vhdx_log_write(BlockDriverState *bs, BDRVVHDXState *s,
+                          void *data, uint32_t length, uint64_t offset)
+{
+    int ret = 0;
+    void *buffer = NULL;
+    void *merged_sector = NULL;
+    void *data_tmp, *sector_write;
+    unsigned int i;
+    int sector_offset;
+    uint32_t desc_sectors, sectors, total_length;
+    uint32_t sectors_written = 0;
+    uint32_t aligned_length;
+    uint32_t leading_length = 0;
+    uint32_t trailing_length = 0;
+    uint32_t partial_sectors = 0;
+    uint32_t bytes_written = 0;
+    uint64_t file_offset;
+    VHDXHeader *header;
+    VHDXLogEntryHeader new_hdr;
+    VHDXLogDescriptor *new_desc = NULL;
+    VHDXLogDataSector *data_sector = NULL;
+    MSGUID new_guid = { 0 };
+
+    header = s->headers[s->curr_header];
+
+    /* need to have offset read data, and be on 4096 byte boundary */
+
+    if (length > header->log_length) {
+        /* no log present.  we could create a log here instead of failing */
+        ret = -EINVAL;
+        goto exit;
+    }
+
+    if (vhdx_log_guid_is_zero(&header->log_guid)) {
+        vhdx_guid_generate(&new_guid);
+        vhdx_update_headers(bs, s, false, &new_guid);
+    } else {
+        /* currently, we require that the log be flushed after
+         * every write. */
+        ret = -ENOTSUP;
+    }
+
+    /* 0 is an invalid sequence number, but may also represent the first
+     * log write (or a wrapped seq) */
+    if (s->log.sequence == 0) {
+        s->log.sequence = 1;
+    }
+
+    sector_offset = offset % VHDX_LOG_SECTOR_SIZE;
+    file_offset = (offset / VHDX_LOG_SECTOR_SIZE) * VHDX_LOG_SECTOR_SIZE;
+
+    aligned_length = length;
+
+    /* add in the unaligned head and tail bytes */
+    if (sector_offset) {
+        leading_length = (VHDX_LOG_SECTOR_SIZE - sector_offset);
+        leading_length = leading_length > length ? length : leading_length;
+        aligned_length -= leading_length;
+        partial_sectors++;
+    }
+
+    sectors = aligned_length / VHDX_LOG_SECTOR_SIZE;
+    trailing_length = aligned_length - (sectors * VHDX_LOG_SECTOR_SIZE);
+    if (trailing_length) {
+        partial_sectors++;
+    }
+
+    sectors += partial_sectors;
+
+    /* sectors is now how many sectors the data itself takes, not
+     * including the header and descriptor metadata */
+
+    new_hdr = (VHDXLogEntryHeader) {
+                .signature           = VHDX_LOG_SIGNATURE,
+                .tail                = s->log.tail,
+                .sequence_number     = s->log.sequence,
+                .descriptor_count    = sectors,
+                .reserved            = 0,
+                .flushed_file_offset = bdrv_getlength(bs->file),
+                .last_file_offset    = bdrv_getlength(bs->file),
+              };
+
+    memcpy(&new_hdr.log_guid, &header->log_guid, sizeof(MSGUID));
+
+    desc_sectors = vhdx_compute_desc_sectors(new_hdr.descriptor_count);
+
+    total_length = (desc_sectors + sectors) * VHDX_LOG_SECTOR_SIZE;
+    new_hdr.entry_length = total_length;
+
+    vhdx_log_entry_hdr_le_export(&new_hdr);
+
+    buffer = qemu_blockalign(bs, total_length);
+    memcpy(buffer, &new_hdr, sizeof(new_hdr));
+
+    new_desc = (VHDXLogDescriptor *) (buffer + sizeof(new_hdr));
+    data_sector = buffer + (desc_sectors * VHDX_LOG_SECTOR_SIZE);
+    data_tmp = data;
+
+    /* All log sectors are 4KB, so for any partial sectors we must
+     * merge the data with preexisting data from the final file
+     * destination */
+    merged_sector = qemu_blockalign(bs, VHDX_LOG_SECTOR_SIZE);
+
+    for (i = 0; i < sectors; i++) {
+        new_desc->signature       = VHDX_LOG_DESC_SIGNATURE;
+        new_desc->sequence_number = s->log.sequence;
+        new_desc->file_offset     = file_offset;
+
+        if (i == 0 && leading_length) {
+            /* partial sector at the front of the buffer */
+            ret = bdrv_pread(bs->file, file_offset, merged_sector,
+                             VHDX_LOG_SECTOR_SIZE);
+            if (ret < 0) {
+                goto exit;
+            }
+            memcpy(merged_sector + sector_offset, data_tmp, leading_length);
+            bytes_written = leading_length;
+            sector_write = merged_sector;
+        } else if (i == sectors - 1 && trailing_length) {
+            /* partial sector at the end of the buffer */
+            ret = bdrv_pread(bs->file,
+                            file_offset,
+                            merged_sector + trailing_length,
+                            VHDX_LOG_SECTOR_SIZE - trailing_length);
+            if (ret < 0) {
+                goto exit;
+            }
+            memcpy(merged_sector, data_tmp, trailing_length);
+            bytes_written = trailing_length;
+            sector_write = merged_sector;
+        } else {
+            bytes_written = VHDX_LOG_SECTOR_SIZE;
+            sector_write = data_tmp;
+        }
+
+        /* populate the raw sector data into the proper structures,
+         * as well as update the descriptor, and convert to proper
+         * endianness */
+        vhdx_log_raw_to_le_sector(new_desc, data_sector, sector_write,
+                                  s->log.sequence);
+
+        data_tmp += bytes_written;
+        data_sector++;
+        new_desc++;
+        file_offset += VHDX_LOG_SECTOR_SIZE;
+    }
+
+    /* checksum covers entire entry, from the log header through the
+     * last data sector */
+    vhdx_update_checksum(buffer, total_length, 4);
+    cpu_to_le32s((uint32_t *)(buffer + 4));
+
+    /* now write to the log */
+    vhdx_log_write_sectors(bs, &s->log, &sectors_written, buffer,
+                           desc_sectors + sectors);
+    if (ret < 0) {
+        goto exit;
+    }
+
+    if (sectors_written != desc_sectors + sectors) {
+        /* instead of failing, we could flush the log here */
+        ret = -EINVAL;
+        goto exit;
+    }
+
+    s->log.sequence++;
+    /* write new tail */
+    s->log.tail = s->log.write;
+
+exit:
+    qemu_vfree(buffer);
+    qemu_vfree(merged_sector);
+    return ret;
+}
+
+/* Perform a log write, and then immediately flush the entire log */
+int vhdx_log_write_and_flush(BlockDriverState *bs, BDRVVHDXState *s,
+                             void *data, uint32_t length, uint64_t offset)
+{
+    int ret = 0;
+    VHDXLogSequence logs = { .valid = true,
+                             .count = 1,
+                             .hdr = { 0 } };
+
+
+    ret = vhdx_log_write(bs, s, data, length, offset);
+    if (ret < 0) {
+        goto exit;
+    }
+    logs.log = s->log;
+
+    ret = vhdx_log_flush(bs, s, &logs);
+    if (ret < 0) {
+        goto exit;
+    }
+
+    s->log = logs.log;
+
+exit:
+    return ret;
+}
+
diff --git a/block/vhdx.h b/block/vhdx.h
index 24b126e..b210efc 100644
--- a/block/vhdx.h
+++ b/block/vhdx.h
@@ -393,6 +393,9 @@ bool vhdx_checksum_is_valid(uint8_t *buf, size_t size, int crc_offset);
 
 int vhdx_parse_log(BlockDriverState *bs, BDRVVHDXState *s);
 
+int vhdx_log_write_and_flush(BlockDriverState *bs, BDRVVHDXState *s,
+                             void *data, uint32_t length, uint64_t offset);
+
 static inline void leguid_to_cpus(MSGUID *guid)
 {
     le32_to_cpus(&guid->data1);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [Qemu-devel] [PATCH 9/9] block: vhdx write support
  2013-07-24 17:54 [Qemu-devel] [PATCH 0/9] VHDX log replay and write support Jeff Cody
                   ` (7 preceding siblings ...)
  2013-07-24 17:54 ` [Qemu-devel] [PATCH 8/9] block: vhdx - add log write support Jeff Cody
@ 2013-07-24 17:54 ` Jeff Cody
  2013-07-30  4:10   ` Fam Zheng
  8 siblings, 1 reply; 19+ messages in thread
From: Jeff Cody @ 2013-07-24 17:54 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, stefanha

This adds support for writing to VHDX image files, using coroutines.
Writes into the BAT table goes through the VHDX log.  Currently, BAT
table writes occur when expanding a dynamic VHDX file, and allocating a
new BAT entry.

Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 block/vhdx.c | 149 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 147 insertions(+), 2 deletions(-)

diff --git a/block/vhdx.c b/block/vhdx.c
index a8dd6d7..791c6dc 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -831,7 +831,7 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, int flags)
         vhdx_update_headers(bs, s, false, NULL);
     }
 
-    /* TODO: differencing files, write */
+    /* TODO: differencing files */
 
     return 0;
 fail:
@@ -963,7 +963,45 @@ exit:
     return ret;
 }
 
+/*
+ * Allocate a new payload block at the end of the file.
+ *
+ * Allocation will happen at 1MB alignment inside the file
+ *
+ * Returns the file offset start of the new payload block
+ */
+static int vhdx_allocate_block(BlockDriverState *bs, BDRVVHDXState *s,
+                                    uint64_t *new_offset)
+{
+    *new_offset = bdrv_getlength(bs->file);
 
+    /* per the spec, the address for a block is in units of 1MB */
+    if (*new_offset % (1024*1024)) {
+        *new_offset = ((*new_offset >> 20) + 1) << 20;  /* round up to 1MB */
+    }
+
+    return bdrv_truncate(bs->file, *new_offset + s->block_size);
+}
+
+/*
+ * Update the BAT tablet entry with the new file offset, and the new entry
+ * state */
+static void vhdx_update_bat_table_entry(BlockDriverState *bs, BDRVVHDXState *s,
+                                       VHDXSectorInfo *sinfo,
+                                       uint64_t *bat_entry,
+                                       uint64_t *bat_offset, int state)
+{
+    /* The BAT entry is a uint64, with 44 bits for the file offset in units of
+     * 1MB, and 3 bits for the block state. */
+    s->bat[sinfo->bat_idx]  = ((sinfo->file_offset>>20) <<
+                               VHDX_BAT_FILE_OFF_BITS);
+
+    s->bat[sinfo->bat_idx] |= state & VHDX_BAT_STATE_BIT_MASK;
+
+    *bat_entry = cpu_to_le64(s->bat[sinfo->bat_idx]);
+    *bat_offset = s->bat_offset + sinfo->bat_idx * sizeof(VHDXBatEntry);
+
+}
 
 /* Per the spec, on the first write of guest-visible data to the file the
  * data write guid must be updated in the header */
@@ -978,7 +1016,114 @@ void vhdx_user_visible_write(BlockDriverState *bs, BDRVVHDXState *s)
 static coroutine_fn int vhdx_co_writev(BlockDriverState *bs, int64_t sector_num,
                                       int nb_sectors, QEMUIOVector *qiov)
 {
-    return -ENOTSUP;
+    int ret = -ENOTSUP;
+    BDRVVHDXState *s = bs->opaque;
+    VHDXSectorInfo sinfo;
+    uint64_t bytes_done = 0;
+    uint64_t bat_entry = 0;
+    uint64_t bat_entry_offset = 0;
+    bool bat_update;
+    QEMUIOVector hd_qiov;
+
+    qemu_iovec_init(&hd_qiov, qiov->niov);
+
+    qemu_co_mutex_lock(&s->lock);
+
+    vhdx_user_visible_write(bs, s);
+
+    while (nb_sectors > 0) {
+        if (s->params.data_bits & VHDX_PARAMS_HAS_PARENT) {
+            /* not supported yet */
+            ret = -ENOTSUP;
+            goto exit;
+        } else {
+            bat_update = false;
+            vhdx_block_translate(s, sector_num, nb_sectors, &sinfo);
+
+            qemu_iovec_reset(&hd_qiov);
+            qemu_iovec_concat(&hd_qiov, qiov,  bytes_done, sinfo.bytes_avail);
+            /* check the payload block state */
+            switch (s->bat[sinfo.bat_idx] & VHDX_BAT_STATE_BIT_MASK) {
+            case PAYLOAD_BLOCK_ZERO:
+                /* in this case, we need to preserve zero writes for
+                 * data that is not part of this write, so we must pad
+                 * the rest of the buffer to zeroes */
+
+                /* if we are on a posix system with ftruncate() that extends
+                 * a file, then it is zero-filled for us.  On Win32, the raw
+                 * layer uses SetFilePointer and SetFileEnd, which does not
+                 * zero fill AFAIK */
+
+                /* TODO: queue another write of zero buffers if the host OS does
+                 * not zero-fill on file extension */
+
+                /* fall through */
+            case PAYLOAD_BLOCK_NOT_PRESENT: /* fall through */
+            case PAYLOAD_BLOCK_UNMAPPED:    /* fall through */
+            case PAYLOAD_BLOCK_UNDEFINED:   /* fall through */
+                ret = vhdx_allocate_block(bs, s, &sinfo.file_offset);
+                if (ret < 0) {
+                    goto exit;
+                }
+                /* once we support differencing files, this may also be
+                 * partially present */
+                /* update block state to the newly specified state */
+                vhdx_update_bat_table_entry(bs, s, &sinfo, &bat_entry,
+                                            &bat_entry_offset,
+                                            PAYLOAD_BLOCK_FULL_PRESENT);
+                bat_update = true;
+                /* since we just allocated a block, file_offset is the
+                 * beginning of the payload block. It needs to be the
+                 * write address, which includes the offset into the block */
+                sinfo.file_offset += sinfo.block_offset;
+                /* fall through */
+            case PAYLOAD_BLOCK_FULL_PRESENT:
+                /* if the file offset address is in the header zone,
+                 * there is a problem */
+                if (sinfo.file_offset < (1024*1024)) {
+                    ret = -EFAULT;
+                    goto exit;
+                }
+                /* block exists, so we can just overwrite it */
+                qemu_co_mutex_unlock(&s->lock);
+                ret = bdrv_co_writev(bs->file,
+                                    sinfo.file_offset>>BDRV_SECTOR_BITS,
+                                    sinfo.sectors_avail, &hd_qiov);
+                qemu_co_mutex_lock(&s->lock);
+                if (ret < 0) {
+                    goto exit;
+                }
+                break;
+            case PAYLOAD_BLOCK_PARTIALLY_PRESENT:
+                /* we don't yet support difference files, fall through
+                 * to error */
+            default:
+                ret = -EIO;
+                goto exit;
+                break;
+            }
+
+            if (bat_update) {
+                /* this will update the BAT entry into the log journal, and
+                 * then flush the log journal out to disk */
+                ret =  vhdx_log_write_and_flush(bs, s, &bat_entry,
+                                                sizeof(VHDXBatEntry),
+                                                bat_entry_offset);
+                if (ret < 0) {
+                    goto exit;
+                }
+            }
+
+            nb_sectors -= sinfo.sectors_avail;
+            sector_num += sinfo.sectors_avail;
+            bytes_done += sinfo.bytes_avail;
+
+        }
+    }
+
+exit:
+    qemu_co_mutex_unlock(&s->lock);
+    return ret;
 }
 
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] block: vhdx - add header update capability.
  2013-07-24 17:54 ` [Qemu-devel] [PATCH 2/9] block: vhdx - add header update capability Jeff Cody
@ 2013-07-26  6:49   ` Fam Zheng
  2013-07-26 11:39     ` Jeff Cody
  0 siblings, 1 reply; 19+ messages in thread
From: Fam Zheng @ 2013-07-26  6:49 UTC (permalink / raw)
  To: Jeff Cody; +Cc: kwolf, qemu-devel, stefanha

On Wed, 07/24 13:54, Jeff Cody wrote:
> This adds the ability to update the headers in a VHDX image, including
> generating a new MS-compatible GUID.
> 
> As VHDX depends on uuid.h, VHDX is now a configurable build option.  If
> VHDX support is enabled, that will also enable uuid as well.  The
> default is to have VHDX enabled.
> 
> To enable/disable VHDX:  --enable-vhdx, --disable-vhdx
> 
> Signed-off-by: Jeff Cody <jcody@redhat.com>
> ---
>  block/Makefile.objs |   2 +-
>  block/vhdx.c        | 157 +++++++++++++++++++++++++++++++++++++++++++++++++++-
>  block/vhdx.h        |  12 +++-
>  configure           |  13 +++++
>  4 files changed, 180 insertions(+), 4 deletions(-)
> 
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index 4cf9aa4..e5e54e6 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -2,7 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
>  block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
>  block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>  block-obj-y += qed-check.o
> -block-obj-y += vhdx.o
> +block-obj-$(CONFIG_VHDX) += vhdx.o
>  block-obj-y += parallels.o blkdebug.o blkverify.o
>  block-obj-y += snapshot.o qapi.o
>  block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
> diff --git a/block/vhdx.c b/block/vhdx.c
> index 56bc88e..13e486d 100644
> --- a/block/vhdx.c
> +++ b/block/vhdx.c
> @@ -21,6 +21,7 @@
>  #include "qemu/crc32c.h"
>  #include "block/vhdx.h"
>  
> +#include <uuid/uuid.h>
>  
>  /* Several metadata and region table data entries are identified by
>   * guids in  a MS-specific GUID format. */
> @@ -156,11 +157,40 @@ typedef struct BDRVVHDXState {
>      VHDXBatEntry *bat;
>      uint64_t bat_offset;
>  
> +    MSGUID session_guid;
> +
> +
>      VHDXParentLocatorHeader parent_header;
>      VHDXParentLocatorEntry *parent_entries;
>  
>  } BDRVVHDXState;
>  
> +/* Calculates new checksum.
> + *
> + * Zero is substituted during crc calculation for the original crc field
> + * crc_offset: byte offset in buf of the buffer crc
> + * buf: buffer pointer
> + * size: size of buffer (must be > crc_offset+4)
> + *
> + * Note: The resulting checksum is in the CPU endianness, not necessarily
> + *       in the file format endianness (LE).  Any header export to disk should
> + *       make sure that vhdx_header_le_export() is used to convert to the
> + *       correct endianness
> + */
> +uint32_t vhdx_update_checksum(uint8_t *buf, size_t size, int crc_offset)
> +{
> +    uint32_t crc;
> +
> +    assert(buf != NULL);
> +    assert(size > (crc_offset + 4));
> +
> +    memset(buf + crc_offset, 0, sizeof(crc));
> +    crc =  crc32c(0xffffffff, buf, size);
> +    memcpy(buf + crc_offset, &crc, sizeof(crc));
> +
> +    return crc;
> +}
> +
>  uint32_t vhdx_checksum_calc(uint32_t crc, uint8_t *buf, size_t size,
>                              int crc_offset)
>  {
> @@ -212,6 +242,24 @@ bool vhdx_checksum_is_valid(uint8_t *buf, size_t size, int crc_offset)
>  
>  
>  /*
> + * This generates a UUID that is compliant with the MS GUIDs used
> + * in the VHDX spec (and elsewhere).
> + *
> + * We can do this with uuid_generate if uuid.h is present,
> + * however not all systems have uuid and the generation is
> + * pretty straightforward for the DCE + random usage case
> + *
> + */
> +void vhdx_guid_generate(MSGUID *guid)
> +{
> +    uuid_t uuid;
> +    assert(guid != NULL);
> +
> +    uuid_generate(uuid);
> +    memcpy(guid, uuid, 16);
> +}
> +
> +/*
>   * Per the MS VHDX Specification, for every VHDX file:
>   *      - The header section is fixed size - 1 MB
>   *      - The header section is always the first "object"
> @@ -249,6 +297,107 @@ static void vhdx_header_le_import(VHDXHeader *h)
>      le64_to_cpus(&h->log_offset);
>  }
>  
> +/* All VHDX structures on disk are little endian */
> +static void vhdx_header_le_export(VHDXHeader *orig_h, VHDXHeader *new_h)
> +{
> +    assert(orig_h != NULL);
> +    assert(new_h != NULL);
> +
> +    new_h->signature       = cpu_to_le32(orig_h->signature);
> +    new_h->checksum        = cpu_to_le32(orig_h->checksum);
> +    new_h->sequence_number = cpu_to_le64(orig_h->sequence_number);
> +
> +    memcpy(&new_h->file_write_guid, &orig_h->file_write_guid, sizeof(MSGUID));
> +    memcpy(&new_h->data_write_guid, &orig_h->data_write_guid, sizeof(MSGUID));
> +    memcpy(&new_h->log_guid,        &orig_h->log_guid,        sizeof(MSGUID));
> +
> +    cpu_to_leguids(&new_h->file_write_guid);
> +    cpu_to_leguids(&new_h->data_write_guid);
> +    cpu_to_leguids(&new_h->log_guid);
> +
> +    new_h->log_version     = cpu_to_le16(orig_h->log_version);
> +    new_h->version         = cpu_to_le16(orig_h->version);
> +    new_h->log_length      = cpu_to_le32(orig_h->log_length);
> +    new_h->log_offset      = cpu_to_le64(orig_h->log_offset);
> +}
> +
> +/* Update the VHDX headers
> + *
> + * This follows the VHDX spec procedures for header updates.
> + *
> + *  - non-current header is updated with largest sequence number
> + */
> +static int vhdx_update_header(BlockDriverState *bs, BDRVVHDXState *s, bool rw)
> +{
> +    int ret = 0;
> +    int hdr_idx = 0;
> +    uint64_t header_offset = VHDX_HEADER1_OFFSET;
> +
> +    VHDXHeader *active_header;
> +    VHDXHeader *inactive_header;
> +    VHDXHeader header_le;
> +    uint8_t *buffer;
> +
> +    /* operate on the non-current header */
> +    if (s->curr_header == 0) {
> +        hdr_idx = 1;
> +        header_offset = VHDX_HEADER2_OFFSET;
> +    }
> +
> +    active_header   = s->headers[s->curr_header];
> +    inactive_header = s->headers[hdr_idx];
> +
> +    inactive_header->sequence_number = active_header->sequence_number + 1;
> +
> +    /* a new file guid must be generate before any file write, including
> +     * headers */
> +    memcpy(&inactive_header->file_write_guid, &s->session_guid,
> +           sizeof(MSGUID));
> +
> +    /* a new data guid only needs to be generate before any guest-visible
> +     * writes, so update it if the image is opened r/w. */
> +    if (rw) {
> +        vhdx_guid_generate(&inactive_header->data_write_guid);
> +    }
> +
> +    /* the header checksum is not over just the packed size of VHDXHeader,
> +     * but rather over the entire 'reserved' range for the header, which is
> +     * 4KB (VHDX_HEADER_SIZE). */
> +
> +    buffer = qemu_blockalign(bs, VHDX_HEADER_SIZE);
> +    /* we can't assume the extra reserved bytes are 0 */
> +    ret = bdrv_pread(bs->file, header_offset, buffer, VHDX_HEADER_SIZE);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +    /* overwrite the actual VHDXHeader portion */
> +    memcpy(buffer, inactive_header, sizeof(VHDXHeader));
> +    inactive_header->checksum = vhdx_update_checksum(buffer,
> +                                                     VHDX_HEADER_SIZE, 4);
> +    vhdx_header_le_export(inactive_header, &header_le);
> +    bdrv_pwrite_sync(bs->file, header_offset, &header_le, sizeof(VHDXHeader));
> +    s->curr_header = hdr_idx;
> +
> +fail:

Labeling it "fail:" is a bit misleading as normal path ends here too.

> +    qemu_vfree(buffer);
> +    return ret;
> +}
> +
> +/*
> + * The VHDX spec calls for header updates to be performed twice, so that both
> + * the current and non-current header have valid info
> + */
> +static int vhdx_update_headers(BlockDriverState *bs, BDRVVHDXState *s, bool rw)
> +{
> +    int ret;
> +
> +    ret = vhdx_update_header(bs, s, rw);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    ret = vhdx_update_header(bs, s, rw);
> +    return ret;
> +}
>  
>  /* opens the specified header block from the VHDX file header section */
>  static int vhdx_parse_header(BlockDriverState *bs, BDRVVHDXState *s)
> @@ -739,6 +888,11 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, int flags)
>          goto fail;
>      }
>  
> +    /* This is used for any header updates, for the file_write_guid.
> +     * The spec dictates that a new value should be used for the first
> +     * header update */
> +    vhdx_guid_generate(&s->session_guid);
> +
>      ret = vhdx_parse_header(bs, s);
>      if (ret) {
>          goto fail;
> @@ -801,8 +955,7 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, int flags)
>      }
>  
>      if (flags & BDRV_O_RDWR) {
> -        ret = -ENOTSUP;
> -        goto fail;
> +        vhdx_update_headers(bs, s, false);
>      }
>  
>      /* TODO: differencing files, write */
> diff --git a/block/vhdx.h b/block/vhdx.h
> index 1dbb320..3999cb1 100644
> --- a/block/vhdx.h
> +++ b/block/vhdx.h
> @@ -309,17 +309,27 @@ typedef struct QEMU_PACKED VHDXParentLocatorEntry {
>  /* ----- END VHDX SPECIFICATION STRUCTURES ---- */
>  
>  
> +void vhdx_guid_generate(MSGUID *guid);
> +
> +uint32_t vhdx_update_checksum(uint8_t *buf, size_t size, int crc_offset);
>  uint32_t vhdx_checksum_calc(uint32_t crc, uint8_t *buf, size_t size,
>                              int crc_offset);
>  
>  bool vhdx_checksum_is_valid(uint8_t *buf, size_t size, int crc_offset);
>  
>  
> -static void leguid_to_cpus(MSGUID *guid)
> +static inline void leguid_to_cpus(MSGUID *guid)
>  {
>      le32_to_cpus(&guid->data1);
>      le16_to_cpus(&guid->data2);
>      le16_to_cpus(&guid->data3);
>  }
>  
> +static inline void cpu_to_leguids(MSGUID *guid)
> +{
> +    cpu_to_le32s(&guid->data1);
> +    cpu_to_le16s(&guid->data2);
> +    cpu_to_le16s(&guid->data3);
> +}
> +
>  #endif
> diff --git a/configure b/configure
> index 877a821..821b790 100755
> --- a/configure
> +++ b/configure
> @@ -244,6 +244,7 @@ gtk=""
>  gtkabi="2.0"
>  tpm="no"
>  libssh2=""
> +vhdx="yes"
>  
>  # parse CC options first
>  for opt do
> @@ -950,6 +951,11 @@ for opt do
>    ;;
>    --enable-libssh2) libssh2="yes"
>    ;;
> +  --enable-vhdx) vhdx="yes" ;
> +                 uuid="yes"
> +  ;;
> +  --disable-vhdx) vhdx="no"
> +  ;;
>    *) echo "ERROR: unknown option $opt"; show_help="yes"
>    ;;
>    esac
> @@ -1166,6 +1172,8 @@ echo "  --gcov=GCOV              use specified gcov [$gcov_tool]"
>  echo "  --enable-tpm             enable TPM support"
>  echo "  --disable-libssh2        disable ssh block device support"
>  echo "  --enable-libssh2         enable ssh block device support"
> +echo "  --disable-vhdx           disables support for the Microsoft VHDX image format"
> +echo "  --enable-vhdx            enable support for the Microsoft VHDX image format"
>  echo ""
>  echo "NOTE: The object files are built at the place where configure is launched"
>  exit 1
> @@ -3622,6 +3630,7 @@ echo "TPM support       $tpm"
>  echo "libssh2 support   $libssh2"
>  echo "TPM passthrough   $tpm_passthrough"
>  echo "QOM debugging     $qom_cast_debug"
> +echo "vhdx              $vhdx"
>  
>  if test "$sdl_too_old" = "yes"; then
>  echo "-> Your SDL version is too old - please upgrade to have SDL support"
> @@ -4012,6 +4021,10 @@ if test "$virtio_blk_data_plane" = "yes" ; then
>    echo 'CONFIG_VIRTIO_BLK_DATA_PLANE=$(CONFIG_VIRTIO)' >> $config_host_mak
>  fi
>  
> +if test "$vhdx" = "yes" ; then
> +  echo "CONFIG_VHDX=y" >> $config_host_mak
> +fi
> +
>  # USB host support
>  case "$usb" in
>  linux)
> -- 
> 1.8.1.4
> 
> 

-- 
Fam

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 2/9] block: vhdx - add header update capability.
  2013-07-26  6:49   ` Fam Zheng
@ 2013-07-26 11:39     ` Jeff Cody
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff Cody @ 2013-07-26 11:39 UTC (permalink / raw)
  To: Fam Zheng; +Cc: kwolf, qemu-devel, stefanha

On Fri, Jul 26, 2013 at 02:49:04PM +0800, Fam Zheng wrote:
> On Wed, 07/24 13:54, Jeff Cody wrote:
> > This adds the ability to update the headers in a VHDX image, including
> > generating a new MS-compatible GUID.
> > 
> > As VHDX depends on uuid.h, VHDX is now a configurable build option.  If
> > VHDX support is enabled, that will also enable uuid as well.  The
> > default is to have VHDX enabled.
> > 
> > To enable/disable VHDX:  --enable-vhdx, --disable-vhdx
> > 
> > Signed-off-by: Jeff Cody <jcody@redhat.com>
> > ---
> >  block/Makefile.objs |   2 +-
> >  block/vhdx.c        | 157 +++++++++++++++++++++++++++++++++++++++++++++++++++-
> >  block/vhdx.h        |  12 +++-
> >  configure           |  13 +++++
> >  4 files changed, 180 insertions(+), 4 deletions(-)
> > 
> > diff --git a/block/Makefile.objs b/block/Makefile.objs
> > index 4cf9aa4..e5e54e6 100644
> > --- a/block/Makefile.objs
> > +++ b/block/Makefile.objs
> > @@ -2,7 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
> >  block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
> >  block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
> >  block-obj-y += qed-check.o
> > -block-obj-y += vhdx.o
> > +block-obj-$(CONFIG_VHDX) += vhdx.o
> >  block-obj-y += parallels.o blkdebug.o blkverify.o
> >  block-obj-y += snapshot.o qapi.o
> >  block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
> > diff --git a/block/vhdx.c b/block/vhdx.c
> > index 56bc88e..13e486d 100644
> > --- a/block/vhdx.c
> > +++ b/block/vhdx.c
> > @@ -21,6 +21,7 @@
> >  #include "qemu/crc32c.h"
> >  #include "block/vhdx.h"
> >  
> > +#include <uuid/uuid.h>
> >  
> >  /* Several metadata and region table data entries are identified by
> >   * guids in  a MS-specific GUID format. */
> > @@ -156,11 +157,40 @@ typedef struct BDRVVHDXState {
> >      VHDXBatEntry *bat;
> >      uint64_t bat_offset;
> >  
> > +    MSGUID session_guid;
> > +
> > +
> >      VHDXParentLocatorHeader parent_header;
> >      VHDXParentLocatorEntry *parent_entries;
> >  
> >  } BDRVVHDXState;
> >  
> > +/* Calculates new checksum.
> > + *
> > + * Zero is substituted during crc calculation for the original crc field
> > + * crc_offset: byte offset in buf of the buffer crc
> > + * buf: buffer pointer
> > + * size: size of buffer (must be > crc_offset+4)
> > + *
> > + * Note: The resulting checksum is in the CPU endianness, not necessarily
> > + *       in the file format endianness (LE).  Any header export to disk should
> > + *       make sure that vhdx_header_le_export() is used to convert to the
> > + *       correct endianness
> > + */
> > +uint32_t vhdx_update_checksum(uint8_t *buf, size_t size, int crc_offset)
> > +{
> > +    uint32_t crc;
> > +
> > +    assert(buf != NULL);
> > +    assert(size > (crc_offset + 4));
> > +
> > +    memset(buf + crc_offset, 0, sizeof(crc));
> > +    crc =  crc32c(0xffffffff, buf, size);
> > +    memcpy(buf + crc_offset, &crc, sizeof(crc));
> > +
> > +    return crc;
> > +}
> > +
> >  uint32_t vhdx_checksum_calc(uint32_t crc, uint8_t *buf, size_t size,
> >                              int crc_offset)
> >  {
> > @@ -212,6 +242,24 @@ bool vhdx_checksum_is_valid(uint8_t *buf, size_t size, int crc_offset)
> >  
> >  
> >  /*
> > + * This generates a UUID that is compliant with the MS GUIDs used
> > + * in the VHDX spec (and elsewhere).
> > + *
> > + * We can do this with uuid_generate if uuid.h is present,
> > + * however not all systems have uuid and the generation is
> > + * pretty straightforward for the DCE + random usage case
> > + *
> > + */
> > +void vhdx_guid_generate(MSGUID *guid)
> > +{
> > +    uuid_t uuid;
> > +    assert(guid != NULL);
> > +
> > +    uuid_generate(uuid);
> > +    memcpy(guid, uuid, 16);
> > +}
> > +
> > +/*
> >   * Per the MS VHDX Specification, for every VHDX file:
> >   *      - The header section is fixed size - 1 MB
> >   *      - The header section is always the first "object"
> > @@ -249,6 +297,107 @@ static void vhdx_header_le_import(VHDXHeader *h)
> >      le64_to_cpus(&h->log_offset);
> >  }
> >  
> > +/* All VHDX structures on disk are little endian */
> > +static void vhdx_header_le_export(VHDXHeader *orig_h, VHDXHeader *new_h)
> > +{
> > +    assert(orig_h != NULL);
> > +    assert(new_h != NULL);
> > +
> > +    new_h->signature       = cpu_to_le32(orig_h->signature);
> > +    new_h->checksum        = cpu_to_le32(orig_h->checksum);
> > +    new_h->sequence_number = cpu_to_le64(orig_h->sequence_number);
> > +
> > +    memcpy(&new_h->file_write_guid, &orig_h->file_write_guid, sizeof(MSGUID));
> > +    memcpy(&new_h->data_write_guid, &orig_h->data_write_guid, sizeof(MSGUID));
> > +    memcpy(&new_h->log_guid,        &orig_h->log_guid,        sizeof(MSGUID));
> > +
> > +    cpu_to_leguids(&new_h->file_write_guid);
> > +    cpu_to_leguids(&new_h->data_write_guid);
> > +    cpu_to_leguids(&new_h->log_guid);
> > +
> > +    new_h->log_version     = cpu_to_le16(orig_h->log_version);
> > +    new_h->version         = cpu_to_le16(orig_h->version);
> > +    new_h->log_length      = cpu_to_le32(orig_h->log_length);
> > +    new_h->log_offset      = cpu_to_le64(orig_h->log_offset);
> > +}
> > +
> > +/* Update the VHDX headers
> > + *
> > + * This follows the VHDX spec procedures for header updates.
> > + *
> > + *  - non-current header is updated with largest sequence number
> > + */
> > +static int vhdx_update_header(BlockDriverState *bs, BDRVVHDXState *s, bool rw)
> > +{
> > +    int ret = 0;
> > +    int hdr_idx = 0;
> > +    uint64_t header_offset = VHDX_HEADER1_OFFSET;
> > +
> > +    VHDXHeader *active_header;
> > +    VHDXHeader *inactive_header;
> > +    VHDXHeader header_le;
> > +    uint8_t *buffer;
> > +
> > +    /* operate on the non-current header */
> > +    if (s->curr_header == 0) {
> > +        hdr_idx = 1;
> > +        header_offset = VHDX_HEADER2_OFFSET;
> > +    }
> > +
> > +    active_header   = s->headers[s->curr_header];
> > +    inactive_header = s->headers[hdr_idx];
> > +
> > +    inactive_header->sequence_number = active_header->sequence_number + 1;
> > +
> > +    /* a new file guid must be generate before any file write, including
> > +     * headers */
> > +    memcpy(&inactive_header->file_write_guid, &s->session_guid,
> > +           sizeof(MSGUID));
> > +
> > +    /* a new data guid only needs to be generate before any guest-visible
> > +     * writes, so update it if the image is opened r/w. */
> > +    if (rw) {
> > +        vhdx_guid_generate(&inactive_header->data_write_guid);
> > +    }
> > +
> > +    /* the header checksum is not over just the packed size of VHDXHeader,
> > +     * but rather over the entire 'reserved' range for the header, which is
> > +     * 4KB (VHDX_HEADER_SIZE). */
> > +
> > +    buffer = qemu_blockalign(bs, VHDX_HEADER_SIZE);
> > +    /* we can't assume the extra reserved bytes are 0 */
> > +    ret = bdrv_pread(bs->file, header_offset, buffer, VHDX_HEADER_SIZE);
> > +    if (ret < 0) {
> > +        goto fail;
> > +    }
> > +    /* overwrite the actual VHDXHeader portion */
> > +    memcpy(buffer, inactive_header, sizeof(VHDXHeader));
> > +    inactive_header->checksum = vhdx_update_checksum(buffer,
> > +                                                     VHDX_HEADER_SIZE, 4);
> > +    vhdx_header_le_export(inactive_header, &header_le);
> > +    bdrv_pwrite_sync(bs->file, header_offset, &header_le, sizeof(VHDXHeader));
> > +    s->curr_header = hdr_idx;
> > +
> > +fail:
> 
> Labeling it "fail:" is a bit misleading as normal path ends here too.
>

Good point; I'll change that to just be 'exit'.

> > +    qemu_vfree(buffer);
> > +    return ret;
> > +}
> > +
> > +/*
> > + * The VHDX spec calls for header updates to be performed twice, so that both
> > + * the current and non-current header have valid info
> > + */
> > +static int vhdx_update_headers(BlockDriverState *bs, BDRVVHDXState *s, bool rw)
> > +{
> > +    int ret;
> > +
> > +    ret = vhdx_update_header(bs, s, rw);
> > +    if (ret < 0) {
> > +        return ret;
> > +    }
> > +    ret = vhdx_update_header(bs, s, rw);
> > +    return ret;
> > +}
> >  
> >  /* opens the specified header block from the VHDX file header section */
> >  static int vhdx_parse_header(BlockDriverState *bs, BDRVVHDXState *s)
> > @@ -739,6 +888,11 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, int flags)
> >          goto fail;
> >      }
> >  
> > +    /* This is used for any header updates, for the file_write_guid.
> > +     * The spec dictates that a new value should be used for the first
> > +     * header update */
> > +    vhdx_guid_generate(&s->session_guid);
> > +
> >      ret = vhdx_parse_header(bs, s);
> >      if (ret) {
> >          goto fail;
> > @@ -801,8 +955,7 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, int flags)
> >      }
> >  
> >      if (flags & BDRV_O_RDWR) {
> > -        ret = -ENOTSUP;
> > -        goto fail;
> > +        vhdx_update_headers(bs, s, false);
> >      }
> >  
> >      /* TODO: differencing files, write */
> > diff --git a/block/vhdx.h b/block/vhdx.h
> > index 1dbb320..3999cb1 100644
> > --- a/block/vhdx.h
> > +++ b/block/vhdx.h
> > @@ -309,17 +309,27 @@ typedef struct QEMU_PACKED VHDXParentLocatorEntry {
> >  /* ----- END VHDX SPECIFICATION STRUCTURES ---- */
> >  
> >  
> > +void vhdx_guid_generate(MSGUID *guid);
> > +
> > +uint32_t vhdx_update_checksum(uint8_t *buf, size_t size, int crc_offset);
> >  uint32_t vhdx_checksum_calc(uint32_t crc, uint8_t *buf, size_t size,
> >                              int crc_offset);
> >  
> >  bool vhdx_checksum_is_valid(uint8_t *buf, size_t size, int crc_offset);
> >  
> >  
> > -static void leguid_to_cpus(MSGUID *guid)
> > +static inline void leguid_to_cpus(MSGUID *guid)
> >  {
> >      le32_to_cpus(&guid->data1);
> >      le16_to_cpus(&guid->data2);
> >      le16_to_cpus(&guid->data3);
> >  }
> >  
> > +static inline void cpu_to_leguids(MSGUID *guid)
> > +{
> > +    cpu_to_le32s(&guid->data1);
> > +    cpu_to_le16s(&guid->data2);
> > +    cpu_to_le16s(&guid->data3);
> > +}
> > +
> >  #endif
> > diff --git a/configure b/configure
> > index 877a821..821b790 100755
> > --- a/configure
> > +++ b/configure
> > @@ -244,6 +244,7 @@ gtk=""
> >  gtkabi="2.0"
> >  tpm="no"
> >  libssh2=""
> > +vhdx="yes"
> >  
> >  # parse CC options first
> >  for opt do
> > @@ -950,6 +951,11 @@ for opt do
> >    ;;
> >    --enable-libssh2) libssh2="yes"
> >    ;;
> > +  --enable-vhdx) vhdx="yes" ;
> > +                 uuid="yes"
> > +  ;;
> > +  --disable-vhdx) vhdx="no"
> > +  ;;
> >    *) echo "ERROR: unknown option $opt"; show_help="yes"
> >    ;;
> >    esac
> > @@ -1166,6 +1172,8 @@ echo "  --gcov=GCOV              use specified gcov [$gcov_tool]"
> >  echo "  --enable-tpm             enable TPM support"
> >  echo "  --disable-libssh2        disable ssh block device support"
> >  echo "  --enable-libssh2         enable ssh block device support"
> > +echo "  --disable-vhdx           disables support for the Microsoft VHDX image format"
> > +echo "  --enable-vhdx            enable support for the Microsoft VHDX image format"
> >  echo ""
> >  echo "NOTE: The object files are built at the place where configure is launched"
> >  exit 1
> > @@ -3622,6 +3630,7 @@ echo "TPM support       $tpm"
> >  echo "libssh2 support   $libssh2"
> >  echo "TPM passthrough   $tpm_passthrough"
> >  echo "QOM debugging     $qom_cast_debug"
> > +echo "vhdx              $vhdx"
> >  
> >  if test "$sdl_too_old" = "yes"; then
> >  echo "-> Your SDL version is too old - please upgrade to have SDL support"
> > @@ -4012,6 +4021,10 @@ if test "$virtio_blk_data_plane" = "yes" ; then
> >    echo 'CONFIG_VIRTIO_BLK_DATA_PLANE=$(CONFIG_VIRTIO)' >> $config_host_mak
> >  fi
> >  
> > +if test "$vhdx" = "yes" ; then
> > +  echo "CONFIG_VHDX=y" >> $config_host_mak
> > +fi
> > +
> >  # USB host support
> >  case "$usb" in
> >  linux)
> > -- 
> > 1.8.1.4
> > 
> > 
> 
> -- 
> Fam

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] block: vhdx - log support struct and defines
  2013-07-24 17:54 ` [Qemu-devel] [PATCH 4/9] block: vhdx - log support struct and defines Jeff Cody
@ 2013-07-30  3:15   ` Fam Zheng
  2013-07-30 13:42     ` Jeff Cody
  0 siblings, 1 reply; 19+ messages in thread
From: Fam Zheng @ 2013-07-30  3:15 UTC (permalink / raw)
  To: Jeff Cody; +Cc: kwolf, qemu-devel, stefanha

On Wed, 07/24 13:54, Jeff Cody wrote:
> This adds some magic number defines, and internal structure
> definitions for VHDX log replay support.
> 
> Signed-off-by: Jeff Cody <jcody@redhat.com>
> ---
>  block/vhdx.h | 21 ++++++++++++++++++++-
>  1 file changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/block/vhdx.h b/block/vhdx.h
> index c8d8593..2db6615 100644
> --- a/block/vhdx.h
> +++ b/block/vhdx.h
> @@ -151,7 +151,10 @@ typedef struct QEMU_PACKED VHDXRegionTableEntry {
>  
>  
>  /* ---- LOG ENTRY STRUCTURES ---- */
> +#define VHDX_LOG_MIN_SIZE (1024*1024)
> +#define VHDX_LOG_SECTOR_SIZE 4096
>  #define VHDX_LOG_HDR_SIZE 64
> +#define VHDX_LOG_SIGNATURE 0x65676f6c
>  typedef struct QEMU_PACKED VHDXLogEntryHeader {
>      uint32_t    signature;              /* "loge" in ASCII */
>      uint32_t    checksum;               /* CRC-32C hash of the 64KB table */
> @@ -174,7 +177,8 @@ typedef struct QEMU_PACKED VHDXLogEntryHeader {
>  } VHDXLogEntryHeader;
>  
>  #define VHDX_LOG_DESC_SIZE 32
> -
> +#define VHDX_LOG_DESC_SIGNATURE 0x63736564
> +#define VHDX_LOG_ZERO_SIGNATURE 0x6f72657a

Are these macros really used? I see "desc" and "zero" used to compare
signatures.

Thanks

Fam

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 7/9] block: vhdx - log parsing, replay, and flush support
  2013-07-24 17:54 ` [Qemu-devel] [PATCH 7/9] block: vhdx - log parsing, replay, and flush support Jeff Cody
@ 2013-07-30  3:48   ` Fam Zheng
  2013-07-30 13:58     ` Jeff Cody
  0 siblings, 1 reply; 19+ messages in thread
From: Fam Zheng @ 2013-07-30  3:48 UTC (permalink / raw)
  To: Jeff Cody; +Cc: kwolf, qemu-devel, stefanha

On Wed, 07/24 13:54, Jeff Cody wrote:
> This adds support for VHDX v0 logs, as specified in Microsoft's
> VHDX Specification Format v1.00:
> https://www.microsoft.com/en-us/download/details.aspx?id=34750
> 
> The following support is added:
> 
> * Log parsing, and validation - validate that an existing log
>   is correct.
> 
> * Log search - search through an existing log, to find any valid
>   sequence of entries.
> 
> * Log replay and flush - replay an existing log, and flush/clear
>   the log when complete.
> 
> The VHDX log is a circular buffer, with elements (sectors) of 4KB.
> 
> A log entry is a variably-length number of sectors, that is
> comprised of a header and 'descriptors', that describe each sector.
> 
> A log may contain multiple entries, know as a log sequence.  In a log
> sequence, each log entry immediately follows the previous entry, with an
> incrementing sequence number.  There can only ever be one active and
> valid sequence in the log.
> 
> Each log entry must match the file log GUID in order to be valid (along
> with other criteria).  Once we have flushed all valid log entries, we
> marked the file log GUID to be zero, which indicates a buffer with no
> valid entries.
> 
> Signed-off-by: Jeff Cody <jcody@redhat.com>
> ---
>  block/Makefile.objs |   2 +-
>  block/vhdx-log.c    | 734 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  block/vhdx.c        |  44 +---
>  block/vhdx.h        |   7 +-
>  4 files changed, 743 insertions(+), 44 deletions(-)
>  create mode 100644 block/vhdx-log.c
> 
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index e6f5d33..2fbd79a 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -2,7 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
>  block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
>  block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>  block-obj-y += qed-check.o
> -block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o
> +block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
>  block-obj-y += parallels.o blkdebug.o blkverify.o
>  block-obj-y += snapshot.o qapi.o
>  block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
> diff --git a/block/vhdx-log.c b/block/vhdx-log.c
> new file mode 100644
> index 0000000..89b9000
> --- /dev/null
> +++ b/block/vhdx-log.c
> @@ -0,0 +1,734 @@
> +/*
> + * Block driver for Hyper-V VHDX Images
> + *
> + * Copyright (c) 2013 Red Hat, Inc.,
> + *
> + * Authors:
> + *  Jeff Cody <jcody@redhat.com>
> + *
> + *  This is based on the "VHDX Format Specification v1.00", published 8/25/2012
> + *  by Microsoft:
> + *      https://www.microsoft.com/en-us/download/details.aspx?id=34750
> + *
> + * This file covers the functionality of the metadata log writing, parsing, and
> + * replay.
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> + * See the COPYING.LIB file in the top-level directory.
> + *
> + */
> +#include "qemu-common.h"
> +#include "block/block_int.h"
> +#include "qemu/module.h"
> +#include "block/vhdx.h"
> +
> +
> +typedef struct VHDXLogSequence {
> +    bool valid;
> +    uint32_t count;
> +    VHDXLogEntries log;
> +    VHDXLogEntryHeader hdr;
> +} VHDXLogSequence;
> +
> +typedef struct VHDXLogDescEntries {
> +    VHDXLogEntryHeader hdr;
> +    VHDXLogDescriptor desc[];
> +} VHDXLogDescEntries;
> +
> +
> +/* Returns true if the GUID is zero */
> +static bool vhdx_log_guid_is_zero(MSGUID *guid)
> +{
> +    int i;
> +    int ret = 0;
> +
> +    /* If either the log guid, or log length is zero,
> +     * then a replay log is not present */
> +    for (i = 0; i < sizeof(MSGUID); i++) {
> +        ret |= ((uint8_t *) guid)[i];
> +    }
> +
> +    return ret == 0;
> +}
> +
> +/* The log located on the disk is circular buffer containing
> + * sectors of 4096 bytes each.
> + *
> + * It is assumed for the read/write functions below that the
> + * circular buffer scheme uses a 'one sector open' to indicate
> + * the buffer is full.  Given the validation methods used for each
> + * sector, this method should be compatible with other methods that
> + * do not waste a sector.
> + */
> +
> +
> +/* Allow peeking at the hdr entry at the beginning of the current
> + * read index, without advancing the read index */
> +static int vhdx_log_peek_hdr(BlockDriverState *bs, VHDXLogEntries *log,
> +                             VHDXLogEntryHeader *hdr)
> +{
> +    int ret = 0;
> +    uint64_t offset;
> +    uint32_t read;
> +
> +    assert(hdr != NULL);
> +
> +    /* peek is only support on sector boundaries */

s/support/supported/

> +    if (log->read % VHDX_LOG_SECTOR_SIZE) {
> +        ret = -EFAULT;
> +        goto exit;
> +    }
> +
> +    read = log->read;
> +    /* we are guaranteed that a) log sectors are 4096 bytes,
> +     * and b) the log length is a multiple of 1MB. So, there
> +     * is always a round number of sectors in the buffer */
> +    if ((read + sizeof(VHDXLogEntryHeader)) > log->length) {
> +        read = 0;
> +    }
> +
> +    if (read == log->write) {
> +        ret = -EINVAL;
> +        goto exit;
> +    }
> +
> +    offset = log->offset + read;
> +
> +    ret = bdrv_pread(bs->file, offset, hdr, sizeof(VHDXLogEntryHeader));
> +    if (ret < 0) {
> +        goto exit;
> +    }
> +
> +exit:
> +    return ret;
> +}
> +
> +/* Index increment for log, based on sector boundaries */
> +static int vhdx_log_inc_idx(uint32_t idx, uint64_t length)
> +{
> +    idx += VHDX_LOG_SECTOR_SIZE;
> +    /* we are guaranteed that a) log sectors are 4096 bytes,
> +     * and b) the log length is a multiple of 1MB. So, there
> +     * is always a round number of sectors in the buffer */
> +    return idx >= length ? 0 : idx;
> +}
> +
> +
> +/* Reset the log to empty */
> +static void vhdx_log_reset(BlockDriverState *bs, BDRVVHDXState *s)
> +{
> +    MSGUID guid = { 0 };
> +    s->log.read = s->log.write = 0;
> +    /* a log guid of 0 indicates an empty log to any parser of v0
> +     * VHDX logs */
> +    vhdx_update_headers(bs, s, false, &guid);
> +}
> +
> +/* Reads num_sectors from the log (all log sectors are 4096 bytes),
> + * into buffer 'buffer'.  Upon return, *sectors_read will contain
> + * the number of sectors successfully read.
> + *
> + * It is assumed that 'buffer' is already allocated, and of sufficient
> + * size (i.e. >= 4096*num_sectors).
> + *
> + * If 'peek' is true, then the tail (read) pointer for the circular buffer is
> + * not modified.
> + *
> + * 0 is returned on success, -errno otherwise.  */
> +static int vhdx_log_read_sectors(BlockDriverState *bs, VHDXLogEntries *log,
> +                                 uint32_t *sectors_read, void *buffer,
> +                                 uint32_t num_sectors, bool peek)
> +{
> +    int ret = 0;
> +    uint64_t offset;
> +    uint32_t read;
> +
> +    read = log->read;
> +
> +    *sectors_read = 0;
> +    while (num_sectors) {
> +        if (read == log->write) {
> +            /* empty */
> +            break;
> +        }
> +        offset = log->offset + read;
> +
> +        ret = bdrv_pread(bs->file, offset, buffer, VHDX_LOG_SECTOR_SIZE);
> +        if (ret < 0) {
> +            goto exit;
> +        }
> +        read = vhdx_log_inc_idx(read, log->length);
> +
> +        *sectors_read = *sectors_read + 1;
> +        num_sectors--;
> +    }
> +
> +exit:
> +    if (!peek) {
> +        log->read = read;
> +    }
> +    return ret;
> +}
> +
> +/* Validates a log entry header */
> +static bool vhdx_log_hdr_is_valid(VHDXLogEntries *log, VHDXLogEntryHeader *hdr,
> +                                  BDRVVHDXState *s)
> +{
> +    int valid = false;
> +
> +    if (memcmp(&hdr->signature, "loge", 4)) {
> +        goto exit;
> +    }
> +
> +    /* if the individual entry length is larger than the whole log
> +     * buffer, that is obviously invalid */
> +    if (log->length < hdr->entry_length) {
> +        goto exit;
> +    }
> +
> +    /* length of entire entry must be in units of 4KB (log sector size) */
> +    if (hdr->entry_length % (VHDX_LOG_SECTOR_SIZE)) {
> +        goto exit;
> +    }
> +
> +    /* per spec, sequence # must be > 0 */
> +    if (hdr->sequence_number == 0) {
> +        goto exit;
> +    }
> +
> +    /* log entries are only valid if they match the file-wide log guid
> +     * found in the active header */
> +    if (!guid_eq(hdr->log_guid, s->headers[s->curr_header]->log_guid)) {
> +        goto exit;
> +    }
> +
> +    valid = true;
> +
> +exit:
> +    return valid;
> +}
> +
> +/*
> + * Given a log header, this will validate that the descriptors and the
> + * corresponding data sectors (if applicable)
> + *
> + * Validation consists of:
> + *      1. Making sure the sequence numbers matches the entry header
> + *      2. Verifying a valid signature ('zero' or desc' for descriptors)

s/ desc'/ 'desc'/

> + *      3. File offset field is a multiple of 4KB
> + *      4. If a data descriptor, the corresponding data sector
> + *         has its signature ('data') and matching sequence number
> + *
> + * 'desc' is the data buffer containing the descriptor
> + * hdr is the log entry header

Please use gtkdoc format:

@desc: the data buffer ...
@hdr:  the log entry header

> + *
> + * Returns true if valid
> + */
> +static bool vhdx_log_desc_is_valid(VHDXLogDescriptor *desc,
> +                                   VHDXLogEntryHeader *hdr)
> +{
> +    bool ret = false;
> +
> +    if (desc->sequence_number != hdr->sequence_number) {
> +        goto exit;
> +    }
> +    if (desc->file_offset % VHDX_LOG_SECTOR_SIZE) {
> +        goto exit;
> +    }
> +
> +    if (!memcmp(&desc->signature, "zero", 4)) {
> +        if (!desc->zero_length % VHDX_LOG_SECTOR_SIZE) {
> +            /* valid */
> +            ret = true;
> +        }
> +    } else if (!memcmp(&desc->signature, "desc", 4)) {
> +            /* valid */
> +            ret = true;
> +    }
> +
> +exit:
> +    return ret;
> +}
> +
> +
> +/* Prior to sector data for a log entry, there is the header
> + * and the descriptors referenced in the header:
> + *
> + * [] = 4KB sector
> + *
> + * [ hdr, desc ][   desc   ][ ... ][ data ][ ... ]
> + *
> + * The first sector in a log entry has a 64 byte header, and
> + * up to 126 32-byte descriptors.  If more descriptors than
> + * 126 are required, then subsequent sectors can have up to 128
> + * descriptors.  Each sector is 4KB.  Data follows the descriptor
> + * sectors.
> + *
> + * This will return the number of sectors needed to encompass
> + * the passed number of descriptors in desc_cnt.
> + *
> + * This will never return 0, even if desc_cnt is 0.
> + */
> +static int vhdx_compute_desc_sectors(uint32_t desc_cnt)
> +{
> +    uint32_t desc_sectors;
> +
> +    desc_cnt += 2; /* account for header in first sector */
> +    desc_sectors = desc_cnt / 128;
> +    if (desc_cnt % 128) {
> +        desc_sectors++;
> +    }
> +
> +    return desc_sectors;
> +}
> +
> +
> +/* Reads the log header, and subsequent descriptors (if any).  This
> + * will allocate all the space for buffer, which must be NULL when
> + * passed into this function. Each descriptor will also be validated,
> + * and error returned if any are invalid. */
> +static int vhdx_log_read_desc(BlockDriverState *bs, BDRVVHDXState *s,
> +                              VHDXLogEntries *log, VHDXLogDescEntries **buffer)
> +{
> +    int ret = 0;
> +    uint32_t desc_sectors;
> +    uint32_t sectors_read;
> +    VHDXLogEntryHeader hdr;
> +    VHDXLogDescEntries *desc_entries = NULL;
> +    int i;
> +
> +    assert(*buffer == NULL);
> +
> +    ret = vhdx_log_peek_hdr(bs, log, &hdr);
> +    if (ret < 0) {
> +        goto exit;
> +    }
> +    vhdx_log_entry_hdr_le_import(&hdr);
> +    if (vhdx_log_hdr_is_valid(log, &hdr, s) == false) {
> +        ret = -EINVAL;
> +        goto exit;
> +    }
> +
> +    desc_sectors = vhdx_compute_desc_sectors(hdr.descriptor_count);
> +    desc_entries = qemu_blockalign(bs, desc_sectors * VHDX_LOG_SECTOR_SIZE);
> +
> +    ret = vhdx_log_read_sectors(bs, log, &sectors_read, desc_entries,
> +                                desc_sectors, false);
> +    if (ret < 0) {
> +        goto free_and_exit;
> +    }
> +    if (sectors_read != desc_sectors) {
> +        ret = -EINVAL;
> +        goto free_and_exit;
> +    }
> +
> +    /* put in proper endianness, and validate each desc */
> +    for (i = 0; i < hdr.descriptor_count; i++) {
> +        vhdx_log_desc_le_import(&desc_entries->desc[i]);
> +        if (vhdx_log_desc_is_valid(&desc_entries->desc[i], &hdr) == false) {
> +            ret = -EINVAL;
> +            goto free_and_exit;
> +        }
> +    }
> +
> +    *buffer = desc_entries;
> +    goto exit;
> +
> +free_and_exit:
> +    qemu_vfree(desc_entries);
> +exit:
> +    return ret;
> +}
> +
> +
> +/* Flushes the descriptor described by desc to the VHDX image file.
> + * If the descriptor is a data descriptor, than 'data' must be non-NULL,
> + * and >= 4096 bytes (VHDX_LOG_SECTOR_SIZE), containing the data to be
> + * written.
> + *
> + * Verification is performed to make sure the sequence numbers of a data
> + * descriptor match the sequence number in the desc.
> + *
> + * For a zero descriptor, it may describe multiple sectors to fill with zeroes.
> + * In this case, it should be noted that zeroes are written to disk, and the
> + * image file is not extended as a sparse file.  */
> +static int vhdx_log_flush_desc(BlockDriverState *bs, VHDXLogDescriptor *desc,
> +                               VHDXLogDataSector *data)
> +{
> +    int ret = 0;
> +    uint64_t seq, file_offset;
> +    uint32_t offset = 0;
> +    void *buffer = NULL;
> +    uint64_t count = 1;
> +    int i;
> +
> +    buffer = qemu_blockalign(bs, VHDX_LOG_SECTOR_SIZE);
> +
> +    if (!memcmp(&desc->signature, "desc", 4)) {
> +        /* data sector */
> +        if (data == NULL) {
> +            ret = -EFAULT;
> +            goto exit;
> +        }
> +
> +        /* The sequence number of the data sector must match that
> +         * in the descriptor */
> +        seq = data->sequence_high;
> +        seq <<= 32;
> +        seq |= data->sequence_low & 0xffffffff;
> +
> +        if (seq != desc->sequence_number) {
> +            ret = -EINVAL;
> +            goto exit;
> +        }
> +
> +        /* Each data sector is in total 4096 bytes, however the first
> +         * 8 bytes, and last 4 bytes, are located in the descriptor */
> +        memcpy(buffer, &desc->leading_bytes, sizeof(desc->leading_bytes));
> +        offset += sizeof(desc->leading_bytes);
> +
> +        memcpy(buffer+offset, data->data, 4084);
> +        offset += 4084;

Could you use sizeof(data->data) instead of 4084?

> +
> +        memcpy(buffer+offset, &desc->trailing_bytes,
> +               sizeof(desc->trailing_bytes));
> +
> +    } else if (!memcmp(&desc->signature, "zero", 4)) {
> +        /* write 'count' sectors of sector */
> +        memset(buffer, 0, VHDX_LOG_SECTOR_SIZE);
> +        count = desc->zero_length / VHDX_LOG_SECTOR_SIZE;
> +    }
> +
> +    file_offset = desc->file_offset;
> +
> +    /* count is only > 1 if we are writing zeroes */
> +    for (i = 0; i < count; i++) {
> +        ret = bdrv_pwrite_sync(bs->file, file_offset, buffer,
> +                               VHDX_LOG_SECTOR_SIZE);
> +        if (ret < 0) {
> +            goto exit;
> +        }
> +        file_offset += VHDX_LOG_SECTOR_SIZE;
> +    }
> +
> +exit:
> +    qemu_vfree(buffer);
> +    return ret;
> +}
> +
> +/* Flush the entire log (as described by 'logs') to the VHDX image
> + * file, and then set the log to 'empty' status once complete.
> + *
> + * The log entries should be validate prior to flushing */
> +static int vhdx_log_flush(BlockDriverState *bs, BDRVVHDXState *s,
> +                          VHDXLogSequence *logs)
> +{
> +    int ret = 0;
> +    int i;
> +    uint32_t cnt, sectors_read;
> +    uint64_t new_file_size;
> +    void *data = NULL;
> +    VHDXLogDescEntries *desc_entries = NULL;
> +    VHDXLogEntryHeader hdr_tmp = { 0 };
> +
> +    cnt = logs->count;
> +
> +    data = qemu_blockalign(bs, VHDX_LOG_SECTOR_SIZE);
> +
> +    vhdx_user_visible_write(bs, s);
> +
> +    /* each iteration represents one log sequence, which may span multiple
> +     * sectors */
> +    while (cnt--) {
> +        ret = vhdx_log_peek_hdr(bs, &logs->log, &hdr_tmp);
> +        if (ret < 0) {
> +            goto exit;
> +        }
> +        /* if the log shows a FlushedFileOffset larger than our current file
> +         * size, then that means the file has been truncated / corrupted, and
> +         * we must refused to open it / use it */
> +        if (hdr_tmp.flushed_file_offset > bdrv_getlength(bs->file)) {
> +            ret = -EINVAL;
> +            goto exit;
> +        }
> +
> +        ret = vhdx_log_read_desc(bs, s, &logs->log, &desc_entries);
> +        if (ret < 0) {
> +            goto exit;
> +        }
> +
> +        for (i = 0; i < desc_entries->hdr.descriptor_count; i++) {
> +            if (!memcmp(&desc_entries->desc[i].signature, "desc", 4)) {
> +                /* data sector, so read a sector to flush */
> +                ret = vhdx_log_read_sectors(bs, &logs->log, &sectors_read,
> +                                            data, 1, false);
> +                if (ret < 0) {
> +                    goto exit;
> +                }
> +                if (sectors_read != 1) {
> +                    ret = -EINVAL;
> +                    goto exit;
> +                }
> +            }
> +
> +            ret = vhdx_log_flush_desc(bs, &desc_entries->desc[i], data);
> +            if (ret < 0) {
> +                goto exit;
> +            }
> +        }
> +        if (bdrv_getlength(bs->file) < desc_entries->hdr.last_file_offset) {
> +            new_file_size = desc_entries->hdr.last_file_offset;
> +            if (new_file_size % (1024*1024)) {
> +                /* round up to nearest 1MB boundary */
> +                new_file_size = ((new_file_size >> 20) + 1) << 20;
> +                bdrv_truncate(bs->file, new_file_size);
> +            }
> +        }
> +        qemu_vfree(desc_entries);
> +        desc_entries = NULL;
> +    }
> +
> +    /* once the log is fully flushed, indicate that we have an empty log
> +     * now.  This also sets the log guid to 0, to indicate an empty log */
> +    vhdx_log_reset(bs, s);
> +
> +exit:
> +    qemu_vfree(data);
> +    qemu_vfree(desc_entries);
> +    return ret;
> +}
> +
> +static int vhdx_validate_log_entry(BlockDriverState *bs, BDRVVHDXState *s,
> +                                   VHDXLogEntries *log, uint64_t seq,
> +                                   bool *valid, VHDXLogEntryHeader *entry)
> +{
> +    int ret = 0;
> +    VHDXLogEntryHeader hdr;
> +    void *buffer = NULL;
> +    uint32_t i, desc_sectors, total_sectors, crc;
> +    uint32_t sectors_read = 0;
> +    VHDXLogDescEntries *desc_buffer = NULL;
> +
> +    *valid = false;
> +
> +    ret = vhdx_log_peek_hdr(bs, log, &hdr);
> +    if (ret < 0) {
> +        goto inc_and_exit;
> +    }
> +
> +    vhdx_log_entry_hdr_le_import(&hdr);
> +
> +
> +    if (vhdx_log_hdr_is_valid(log, &hdr, s) == false) {
> +        goto inc_and_exit;
> +    }
> +
> +    if (seq > 0) {
> +        if (hdr.sequence_number != seq + 1) {
> +            goto inc_and_exit;
> +        }
> +    }
> +
> +    desc_sectors = vhdx_compute_desc_sectors(hdr.descriptor_count);
> +
> +    /* Read desc sectors, and calculate log checksum */
> +
> +    total_sectors = hdr.entry_length / VHDX_LOG_SECTOR_SIZE;
> +
> +
> +    /* read_desc() will incrememnt the read idx */
> +    ret = vhdx_log_read_desc(bs, s, log, &desc_buffer);
> +    if (ret < 0) {
> +        goto free_and_exit;
> +    }
> +
> +    crc = vhdx_checksum_calc(0xffffffff, (void *)desc_buffer,
> +                            desc_sectors * VHDX_LOG_SECTOR_SIZE, 4);
> +    crc ^= 0xffffffff;
> +
> +    buffer = qemu_blockalign(bs, VHDX_LOG_SECTOR_SIZE);
> +    if (total_sectors > desc_sectors) {
> +        for (i = 0; i < total_sectors - desc_sectors; i++) {
> +            sectors_read = 0;
> +            ret = vhdx_log_read_sectors(bs, log, &sectors_read, buffer,
> +                                        1, false);
> +            if (ret < 0 || sectors_read != 1) {
> +                goto free_and_exit;
> +            }
> +            crc = vhdx_checksum_calc(crc, buffer, VHDX_LOG_SECTOR_SIZE, -1);
> +            crc ^= 0xffffffff;
> +        }
> +    }
> +    crc ^= 0xffffffff;
> +    if (crc != desc_buffer->hdr.checksum) {
> +        goto free_and_exit;
> +    }
> +
> +    *valid = true;
> +    *entry = hdr;
> +    goto free_and_exit;
> +
> +inc_and_exit:
> +    log->read = vhdx_log_inc_idx(log->read, log->length);
> +
> +free_and_exit:
> +    qemu_vfree(buffer);
> +    qemu_vfree(desc_buffer);
> +    return ret;
> +}
> +
> +/* Search through the log circular buffer, and find the valid, active
> + * log sequence, if any exists
> + * */
> +static int vhdx_log_search(BlockDriverState *bs, BDRVVHDXState *s,
> +                           VHDXLogSequence *logs)
> +{
> +    int ret = 0;
> +
> +    uint64_t curr_seq = 0;
> +    VHDXLogSequence candidate = { 0 };
> +    VHDXLogSequence current = { 0 };
> +
> +    uint32_t tail;
> +    bool seq_valid = false;
> +    VHDXLogEntryHeader hdr = { 0 };
> +    VHDXLogEntries curr_log;
> +
> +    memcpy(&curr_log, &s->log, sizeof(VHDXLogEntries));
> +    curr_log.write = curr_log.length;   /* assume log is full */
> +    curr_log.read = 0;
> +
> +
> +    /* now we will go through the whole log sector by sector, until
> +     * we find a valid, active log sequence, or reach the end of the
> +     * log buffer */
> +    for (;;) {
> +        tail = curr_log.read;
> +
> +        curr_seq = 0;
> +        memset(&current, 0, sizeof(current));
> +
> +        ret = vhdx_validate_log_entry(bs, s, &curr_log, curr_seq,
> +                                      &seq_valid, &hdr);
> +        if (ret < 0) {
> +            goto exit;
> +        }
> +
> +        if (seq_valid) {
> +            current.valid     = true;
> +            current.log       = curr_log;
> +            current.log.read  = tail;
> +            current.log.write = curr_log.read;
> +            current.count     = 1;
> +            current.hdr       = hdr;
> +
> +
> +            for (;;) {
> +                ret = vhdx_validate_log_entry(bs, s, &curr_log, curr_seq,
> +                                              &seq_valid, &hdr);
> +                if (ret < 0) {
> +                    goto exit;
> +                }
> +                if (seq_valid == false) {
> +                    break;
> +                }
> +                current.log.write = curr_log.read;
> +                current.count++;
> +
> +                curr_seq = hdr.sequence_number;
> +            }
> +        }
> +
> +        if (current.valid) {
> +            if (candidate.valid == false ||
> +                current.hdr.sequence_number > candidate.hdr.sequence_number) {
> +                candidate = current;
> +            }
> +        }
> +
> +        if (curr_log.read < tail) {
> +            break;
> +        }
> +    }
> +
> +    *logs = candidate;
> +
> +    if (candidate.valid) {
> +        /* this is the next sequence number, for writes */
> +        s->log.sequence = candidate.hdr.sequence_number + 1;
> +    }
> +
> +
> +exit:
> +    return ret;
> +}
> +
> +/* Parse the replay log.  Per the VHDX spec, if the log is present
> + * it must be replayed prior to opening the file, even read-only.
> + *
> + * If read-only, we must replay the log in RAM (or refuse to open
> + * a dirty VHDX file read-only */
> +int vhdx_parse_log(BlockDriverState *bs, BDRVVHDXState *s)
> +{
> +    int ret = 0;
> +    VHDXHeader *hdr;
> +    VHDXLogSequence logs = { 0 };
> +
> +    hdr = s->headers[s->curr_header];
> +
> +    /* s->log.hdr is freed in vhdx_close() */
> +    if (s->log.hdr == NULL) {
> +        s->log.hdr = qemu_blockalign(bs, sizeof(VHDXLogEntryHeader));
> +    }
> +
> +    s->log.offset = hdr->log_offset;
> +    s->log.length = hdr->log_length;
> +
> +    if (s->log.offset < VHDX_LOG_MIN_SIZE ||
> +        s->log.offset % VHDX_LOG_MIN_SIZE) {
> +        ret = -EINVAL;
> +        goto exit;
> +    }
> +
> +    /* per spec, only log version of 0 is supported */
> +    if (hdr->log_version != 0) {
> +        ret = -EINVAL;
> +        goto exit;
> +    }
> +
> +    /* If either the log guid, or log length is zero,
> +     * then a replay log is not present */
> +    if (vhdx_log_guid_is_zero(&hdr->log_guid)) {
> +        goto exit;
> +    }
> +
> +
Too many blank lines here.
> +
> +    if (hdr->log_length == 0) {
> +        goto exit;
> +    }
> +
> +    if (hdr->log_length % VHDX_LOG_MIN_SIZE) {
> +        ret = -EINVAL;
> +        goto exit;
> +    }
> +
> +
> +    /* The log is present, we need to find if and where there is an active
> +     * sequence of valid entries present in the log.  */
> +
> +    ret = vhdx_log_search(bs, s, &logs);
> +    if (ret < 0) {
> +        goto exit;
> +    }
> +
> +    if (logs.valid) {
> +        /* now flush the log */
> +        ret = vhdx_log_flush(bs, s, &logs);

Does it flush log regardless of read-only open?

> +    }
> +
> +
> +exit:
> +    return ret;
> +}
> +
> diff --git a/block/vhdx.c b/block/vhdx.c
> index f5689c3..a8dd6d7 100644
> --- a/block/vhdx.c
> +++ b/block/vhdx.c
> @@ -735,48 +735,6 @@ exit:
>      return ret;
>  }
>  
> -/* Parse the replay log.  Per the VHDX spec, if the log is present
> - * it must be replayed prior to opening the file, even read-only.
> - *
> - * If read-only, we must replay the log in RAM (or refuse to open
> - * a dirty VHDX file read-only */
> -static int vhdx_parse_log(BlockDriverState *bs, BDRVVHDXState *s)
> -{
> -    int ret = 0;
> -    int i;
> -    VHDXHeader *hdr;
> -
> -    hdr = s->headers[s->curr_header];
> -
> -    /* either the log guid, or log length is zero,
> -     * then a replay log is present */
> -    for (i = 0; i < sizeof(hdr->log_guid.data4); i++) {
> -        ret |= hdr->log_guid.data4[i];
> -    }
> -    if (hdr->log_guid.data1 == 0 &&
> -        hdr->log_guid.data2 == 0 &&
> -        hdr->log_guid.data3 == 0 &&
> -        ret == 0) {
> -        goto exit;
> -    }
> -
> -    /* per spec, only log version of 0 is supported */
> -    if (hdr->log_version != 0) {
> -        ret = -EINVAL;
> -        goto exit;
> -    }
> -
> -    if (hdr->log_length == 0) {
> -        goto exit;
> -    }
> -
> -    /* We currently do not support images with logs to replay */
> -    ret = -ENOTSUP;
> -
> -exit:
> -    return ret;
> -}
> -
>  
>  static int vhdx_open(BlockDriverState *bs, QDict *options, int flags)
>  {
> @@ -789,6 +747,7 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, int flags)
>  
>      s->bat = NULL;
>      s->first_visible_write = true;
> +    s->log.write = s->log.read = 0;
>  
>      qemu_co_mutex_init(&s->lock);
>  
> @@ -1030,6 +989,7 @@ static void vhdx_close(BlockDriverState *bs)
>      qemu_vfree(s->headers[1]);
>      qemu_vfree(s->bat);
>      qemu_vfree(s->parent_entries);
> +    qemu_vfree(s->log.hdr);
>  }
>  
>  static BlockDriver bdrv_vhdx = {
> diff --git a/block/vhdx.h b/block/vhdx.h
> index cb3ce0e..24b126e 100644
> --- a/block/vhdx.h
> +++ b/block/vhdx.h
> @@ -326,7 +326,11 @@ typedef struct VHDXMetadataEntries {
>  typedef struct VHDXLogEntries {
>      uint64_t offset;
>      uint64_t length;
> -    uint32_t head;
> +    uint32_t write;
> +    uint32_t read;
> +    VHDXLogEntryHeader *hdr;
> +    void *desc_buffer;
> +    uint64_t sequence;
>      uint32_t tail;
>  } VHDXLogEntries;
>  
> @@ -387,6 +391,7 @@ uint32_t vhdx_checksum_calc(uint32_t crc, uint8_t *buf, size_t size,
>  
>  bool vhdx_checksum_is_valid(uint8_t *buf, size_t size, int crc_offset);
>  
> +int vhdx_parse_log(BlockDriverState *bs, BDRVVHDXState *s);
>  
>  static inline void leguid_to_cpus(MSGUID *guid)
>  {
> -- 
> 1.8.1.4
> 
> 

-- 
Fam

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 8/9] block: vhdx - add log write support
  2013-07-24 17:54 ` [Qemu-devel] [PATCH 8/9] block: vhdx - add log write support Jeff Cody
@ 2013-07-30  3:57   ` Fam Zheng
  2013-07-30 14:11     ` Jeff Cody
  0 siblings, 1 reply; 19+ messages in thread
From: Fam Zheng @ 2013-07-30  3:57 UTC (permalink / raw)
  To: Jeff Cody; +Cc: kwolf, qemu-devel, stefanha

On Wed, 07/24 13:54, Jeff Cody wrote:
> This adds support for writing to the VHDX log.
> 
> For spec details, see VHDX Specification Format v1.00:
> https://www.microsoft.com/en-us/download/details.aspx?id=34750
> 
> There are a few limitations to this log support:
> 1.) There is no caching yet
> 2.) The log is flushed after each entry
> 
> The primary write interface, vhdx_log_write_and_flush(), performs a log
> write followed by an immediate flush of the log.
> 
> As each log entry sector is a minimum of 4KB, partial sector writes are
> filled in with data from the disk write destination.
> 
> If the current file log GUID is 0, a new GUID is generated and updated
> in the header.
> 
> Signed-off-by: Jeff Cody <jcody@redhat.com>
> ---
>  block/vhdx-log.c | 273 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  block/vhdx.h     |   3 +
>  2 files changed, 276 insertions(+)
> 
> diff --git a/block/vhdx-log.c b/block/vhdx-log.c
> index 89b9000..786b393 100644
> --- a/block/vhdx-log.c
> +++ b/block/vhdx-log.c
> @@ -170,6 +170,53 @@ exit:
>      return ret;
>  }
>  
> +/* Writes num_sectors to the log (all log sectors are 4096 bytes),
> + * from buffer 'buffer'.  Upon return, *sectors_written will contain
> + * the number of sectors successfully written.
> + *
> + * It is assumed that 'buffer' is at least 4096*num_sectors large.
> + *
> + * 0 is returned on success, -errno otherwise */
> +static int vhdx_log_write_sectors(BlockDriverState *bs, VHDXLogEntries *log,
> +                                  uint32_t *sectors_written, void *buffer,
> +                                  uint32_t num_sectors)
> +{
> +    int ret = 0;
> +    uint64_t offset;
> +    uint32_t write;
> +    void *buffer_tmp;
> +    BDRVVHDXState *s = bs->opaque;
> +
> +    vhdx_user_visible_write(bs, s);
> +
> +    write = log->write;
> +
> +    buffer_tmp = buffer;
> +    while (num_sectors) {
> +
> +        offset = log->offset + write;
> +        write = vhdx_log_inc_idx(write, log->length);
> +        if (write == log->read) {
> +            /* full */
> +            break;
> +        }
> +        ret = bdrv_pwrite_sync(bs->file, offset, buffer_tmp,
> +                               VHDX_LOG_SECTOR_SIZE);
> +        if (ret < 0) {
> +            goto exit;
> +        }
> +        buffer_tmp += VHDX_LOG_SECTOR_SIZE;
> +
> +        log->write = write;
> +        *sectors_written = *sectors_written + 1;
> +        num_sectors--;
> +    }
> +
> +exit:
> +    return ret;
> +}
> +
> +
>  /* Validates a log entry header */
>  static bool vhdx_log_hdr_is_valid(VHDXLogEntries *log, VHDXLogEntryHeader *hdr,
>                                    BDRVVHDXState *s)
> @@ -732,3 +779,229 @@ exit:
>      return ret;
>  }
>  
> +
> +
> +static void vhdx_log_raw_to_le_sector(VHDXLogDescriptor *desc,
> +                                      VHDXLogDataSector *sector, void *data,
> +                                      uint64_t seq)
> +{
> +    memcpy(&desc->leading_bytes, data, 8);
> +    data += 8;
> +    cpu_to_le64s(&desc->leading_bytes);
> +    memcpy(sector->data, data, 4084);
> +    data += 4084;
> +    memcpy(&desc->trailing_bytes, data, 4);
> +    cpu_to_le32s(&desc->trailing_bytes);
> +    data += 4;
> +
> +    sector->sequence_high  = (uint32_t) (seq >> 32);
> +    sector->sequence_low   = (uint32_t) (seq & 0xffffffff);
> +    sector->data_signature = VHDX_LOG_DATA_SIGNATURE;
> +
> +    vhdx_log_desc_le_export(desc);
> +    vhdx_log_data_le_export(sector);
> +}
> +
> +
> +static int vhdx_log_write(BlockDriverState *bs, BDRVVHDXState *s,
> +                          void *data, uint32_t length, uint64_t offset)
> +{
> +    int ret = 0;
> +    void *buffer = NULL;
> +    void *merged_sector = NULL;
> +    void *data_tmp, *sector_write;
> +    unsigned int i;
> +    int sector_offset;
> +    uint32_t desc_sectors, sectors, total_length;
> +    uint32_t sectors_written = 0;
> +    uint32_t aligned_length;
> +    uint32_t leading_length = 0;
> +    uint32_t trailing_length = 0;
> +    uint32_t partial_sectors = 0;
> +    uint32_t bytes_written = 0;
> +    uint64_t file_offset;
> +    VHDXHeader *header;
> +    VHDXLogEntryHeader new_hdr;
> +    VHDXLogDescriptor *new_desc = NULL;
> +    VHDXLogDataSector *data_sector = NULL;
> +    MSGUID new_guid = { 0 };
> +
> +    header = s->headers[s->curr_header];
> +
> +    /* need to have offset read data, and be on 4096 byte boundary */
> +
> +    if (length > header->log_length) {
> +        /* no log present.  we could create a log here instead of failing */

Does newly created vhdx have allocated log sectors?

> +        ret = -EINVAL;
> +        goto exit;
> +    }
> +
> +    if (vhdx_log_guid_is_zero(&header->log_guid)) {
> +        vhdx_guid_generate(&new_guid);
> +        vhdx_update_headers(bs, s, false, &new_guid);
> +    } else {
> +        /* currently, we require that the log be flushed after
> +         * every write. */
> +        ret = -ENOTSUP;

Can we make an assertion here?

> +    }
> +
> +    /* 0 is an invalid sequence number, but may also represent the first
> +     * log write (or a wrapped seq) */
> +    if (s->log.sequence == 0) {
> +        s->log.sequence = 1;
> +    }
> +
> +    sector_offset = offset % VHDX_LOG_SECTOR_SIZE;
> +    file_offset = (offset / VHDX_LOG_SECTOR_SIZE) * VHDX_LOG_SECTOR_SIZE;
> +
> +    aligned_length = length;
> +
> +    /* add in the unaligned head and tail bytes */
> +    if (sector_offset) {
> +        leading_length = (VHDX_LOG_SECTOR_SIZE - sector_offset);
> +        leading_length = leading_length > length ? length : leading_length;
> +        aligned_length -= leading_length;
> +        partial_sectors++;
> +    }
> +
> +    sectors = aligned_length / VHDX_LOG_SECTOR_SIZE;
> +    trailing_length = aligned_length - (sectors * VHDX_LOG_SECTOR_SIZE);
> +    if (trailing_length) {
> +        partial_sectors++;
> +    }
> +
> +    sectors += partial_sectors;
> +
> +    /* sectors is now how many sectors the data itself takes, not
> +     * including the header and descriptor metadata */
> +
> +    new_hdr = (VHDXLogEntryHeader) {
> +                .signature           = VHDX_LOG_SIGNATURE,
> +                .tail                = s->log.tail,
> +                .sequence_number     = s->log.sequence,
> +                .descriptor_count    = sectors,
> +                .reserved            = 0,
> +                .flushed_file_offset = bdrv_getlength(bs->file),
> +                .last_file_offset    = bdrv_getlength(bs->file),
> +              };
> +
> +    memcpy(&new_hdr.log_guid, &header->log_guid, sizeof(MSGUID));
> +
> +    desc_sectors = vhdx_compute_desc_sectors(new_hdr.descriptor_count);
> +
> +    total_length = (desc_sectors + sectors) * VHDX_LOG_SECTOR_SIZE;
> +    new_hdr.entry_length = total_length;
> +
> +    vhdx_log_entry_hdr_le_export(&new_hdr);
> +
> +    buffer = qemu_blockalign(bs, total_length);
> +    memcpy(buffer, &new_hdr, sizeof(new_hdr));
> +
> +    new_desc = (VHDXLogDescriptor *) (buffer + sizeof(new_hdr));
> +    data_sector = buffer + (desc_sectors * VHDX_LOG_SECTOR_SIZE);
> +    data_tmp = data;
> +
> +    /* All log sectors are 4KB, so for any partial sectors we must
> +     * merge the data with preexisting data from the final file
> +     * destination */
> +    merged_sector = qemu_blockalign(bs, VHDX_LOG_SECTOR_SIZE);
> +
> +    for (i = 0; i < sectors; i++) {
> +        new_desc->signature       = VHDX_LOG_DESC_SIGNATURE;
> +        new_desc->sequence_number = s->log.sequence;
> +        new_desc->file_offset     = file_offset;
> +
> +        if (i == 0 && leading_length) {
> +            /* partial sector at the front of the buffer */
> +            ret = bdrv_pread(bs->file, file_offset, merged_sector,
> +                             VHDX_LOG_SECTOR_SIZE);
> +            if (ret < 0) {
> +                goto exit;
> +            }
> +            memcpy(merged_sector + sector_offset, data_tmp, leading_length);
> +            bytes_written = leading_length;
> +            sector_write = merged_sector;
> +        } else if (i == sectors - 1 && trailing_length) {
> +            /* partial sector at the end of the buffer */
> +            ret = bdrv_pread(bs->file,
> +                            file_offset,
> +                            merged_sector + trailing_length,
> +                            VHDX_LOG_SECTOR_SIZE - trailing_length);
> +            if (ret < 0) {
> +                goto exit;
> +            }
> +            memcpy(merged_sector, data_tmp, trailing_length);
> +            bytes_written = trailing_length;
> +            sector_write = merged_sector;
> +        } else {
> +            bytes_written = VHDX_LOG_SECTOR_SIZE;
> +            sector_write = data_tmp;
> +        }
> +
> +        /* populate the raw sector data into the proper structures,
> +         * as well as update the descriptor, and convert to proper
> +         * endianness */
> +        vhdx_log_raw_to_le_sector(new_desc, data_sector, sector_write,
> +                                  s->log.sequence);
> +
> +        data_tmp += bytes_written;
> +        data_sector++;
> +        new_desc++;
> +        file_offset += VHDX_LOG_SECTOR_SIZE;
> +    }
> +
> +    /* checksum covers entire entry, from the log header through the
> +     * last data sector */
> +    vhdx_update_checksum(buffer, total_length, 4);
> +    cpu_to_le32s((uint32_t *)(buffer + 4));
> +
> +    /* now write to the log */
> +    vhdx_log_write_sectors(bs, &s->log, &sectors_written, buffer,
> +                           desc_sectors + sectors);
> +    if (ret < 0) {
> +        goto exit;
> +    }
> +
> +    if (sectors_written != desc_sectors + sectors) {
> +        /* instead of failing, we could flush the log here */
> +        ret = -EINVAL;
> +        goto exit;
> +    }
> +
> +    s->log.sequence++;
> +    /* write new tail */
> +    s->log.tail = s->log.write;
> +
> +exit:
> +    qemu_vfree(buffer);
> +    qemu_vfree(merged_sector);
> +    return ret;
> +}
> +
> +/* Perform a log write, and then immediately flush the entire log */
> +int vhdx_log_write_and_flush(BlockDriverState *bs, BDRVVHDXState *s,
> +                             void *data, uint32_t length, uint64_t offset)
> +{
> +    int ret = 0;
> +    VHDXLogSequence logs = { .valid = true,
> +                             .count = 1,
> +                             .hdr = { 0 } };
> +
> +
> +    ret = vhdx_log_write(bs, s, data, length, offset);
> +    if (ret < 0) {
> +        goto exit;
> +    }
> +    logs.log = s->log;
> +
> +    ret = vhdx_log_flush(bs, s, &logs);
> +    if (ret < 0) {
> +        goto exit;
> +    }
> +
> +    s->log = logs.log;
> +
> +exit:
> +    return ret;
> +}
> +
> diff --git a/block/vhdx.h b/block/vhdx.h
> index 24b126e..b210efc 100644
> --- a/block/vhdx.h
> +++ b/block/vhdx.h
> @@ -393,6 +393,9 @@ bool vhdx_checksum_is_valid(uint8_t *buf, size_t size, int crc_offset);
>  
>  int vhdx_parse_log(BlockDriverState *bs, BDRVVHDXState *s);
>  
> +int vhdx_log_write_and_flush(BlockDriverState *bs, BDRVVHDXState *s,
> +                             void *data, uint32_t length, uint64_t offset);
> +
>  static inline void leguid_to_cpus(MSGUID *guid)
>  {
>      le32_to_cpus(&guid->data1);
> -- 
> 1.8.1.4
> 
> 

-- 
Fam

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 9/9] block: vhdx write support
  2013-07-24 17:54 ` [Qemu-devel] [PATCH 9/9] block: vhdx " Jeff Cody
@ 2013-07-30  4:10   ` Fam Zheng
  0 siblings, 0 replies; 19+ messages in thread
From: Fam Zheng @ 2013-07-30  4:10 UTC (permalink / raw)
  To: Jeff Cody; +Cc: kwolf, qemu-devel, stefanha

On Wed, 07/24 13:54, Jeff Cody wrote:
> This adds support for writing to VHDX image files, using coroutines.
> Writes into the BAT table goes through the VHDX log.  Currently, BAT
> table writes occur when expanding a dynamic VHDX file, and allocating a
> new BAT entry.
> 
> Signed-off-by: Jeff Cody <jcody@redhat.com>
> ---
>  block/vhdx.c | 149 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 147 insertions(+), 2 deletions(-)
> 
> diff --git a/block/vhdx.c b/block/vhdx.c
> index a8dd6d7..791c6dc 100644
> --- a/block/vhdx.c
> +++ b/block/vhdx.c
> @@ -831,7 +831,7 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, int flags)
>          vhdx_update_headers(bs, s, false, NULL);
>      }
>  
> -    /* TODO: differencing files, write */
> +    /* TODO: differencing files */
>  
>      return 0;
>  fail:
> @@ -963,7 +963,45 @@ exit:
>      return ret;
>  }
>  
> +/*
> + * Allocate a new payload block at the end of the file.
> + *
> + * Allocation will happen at 1MB alignment inside the file
> + *
> + * Returns the file offset start of the new payload block
> + */
> +static int vhdx_allocate_block(BlockDriverState *bs, BDRVVHDXState *s,
> +                                    uint64_t *new_offset)
> +{
> +    *new_offset = bdrv_getlength(bs->file);
>  
> +    /* per the spec, the address for a block is in units of 1MB */
> +    if (*new_offset % (1024*1024)) {
> +        *new_offset = ((*new_offset >> 20) + 1) << 20;  /* round up to 1MB */

You can use ROUND_UP() macro here:

*new_offset = ROUND_UP(*new_offset, 1 << 20);

Fam

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 4/9] block: vhdx - log support struct and defines
  2013-07-30  3:15   ` Fam Zheng
@ 2013-07-30 13:42     ` Jeff Cody
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff Cody @ 2013-07-30 13:42 UTC (permalink / raw)
  To: Fam Zheng; +Cc: kwolf, qemu-devel, stefanha

On Tue, Jul 30, 2013 at 11:15:02AM +0800, Fam Zheng wrote:
> On Wed, 07/24 13:54, Jeff Cody wrote:
> > This adds some magic number defines, and internal structure
> > definitions for VHDX log replay support.
> > 
> > Signed-off-by: Jeff Cody <jcody@redhat.com>
> > ---
> >  block/vhdx.h | 21 ++++++++++++++++++++-
> >  1 file changed, 20 insertions(+), 1 deletion(-)
> > 
> > diff --git a/block/vhdx.h b/block/vhdx.h
> > index c8d8593..2db6615 100644
> > --- a/block/vhdx.h
> > +++ b/block/vhdx.h
> > @@ -151,7 +151,10 @@ typedef struct QEMU_PACKED VHDXRegionTableEntry {
> >  
> >  
> >  /* ---- LOG ENTRY STRUCTURES ---- */
> > +#define VHDX_LOG_MIN_SIZE (1024*1024)
> > +#define VHDX_LOG_SECTOR_SIZE 4096
> >  #define VHDX_LOG_HDR_SIZE 64
> > +#define VHDX_LOG_SIGNATURE 0x65676f6c
> >  typedef struct QEMU_PACKED VHDXLogEntryHeader {
> >      uint32_t    signature;              /* "loge" in ASCII */
> >      uint32_t    checksum;               /* CRC-32C hash of the 64KB table */
> > @@ -174,7 +177,8 @@ typedef struct QEMU_PACKED VHDXLogEntryHeader {
> >  } VHDXLogEntryHeader;
> >  
> >  #define VHDX_LOG_DESC_SIZE 32
> > -
> > +#define VHDX_LOG_DESC_SIGNATURE 0x63736564
> > +#define VHDX_LOG_ZERO_SIGNATURE 0x6f72657a
> 
> Are these macros really used? I see "desc" and "zero" used to compare
> signatures.
>

They are used in the log write patch (when creating new log sectors).
Right now, we only create data sectors (nothing uses the zero sectors
yet), so only the _DESC_SIGNATURE is used.

Thanks,
Jeff

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 7/9] block: vhdx - log parsing, replay, and flush support
  2013-07-30  3:48   ` Fam Zheng
@ 2013-07-30 13:58     ` Jeff Cody
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff Cody @ 2013-07-30 13:58 UTC (permalink / raw)
  To: Fam Zheng; +Cc: kwolf, qemu-devel, stefanha

On Tue, Jul 30, 2013 at 11:48:15AM +0800, Fam Zheng wrote:
> On Wed, 07/24 13:54, Jeff Cody wrote:
> > This adds support for VHDX v0 logs, as specified in Microsoft's
> > VHDX Specification Format v1.00:
> > https://www.microsoft.com/en-us/download/details.aspx?id=34750
> > 
> > The following support is added:
> > 
> > * Log parsing, and validation - validate that an existing log
> >   is correct.
> > 
> > * Log search - search through an existing log, to find any valid
> >   sequence of entries.
> > 
> > * Log replay and flush - replay an existing log, and flush/clear
> >   the log when complete.
> > 
> > The VHDX log is a circular buffer, with elements (sectors) of 4KB.
> > 
> > A log entry is a variably-length number of sectors, that is
> > comprised of a header and 'descriptors', that describe each sector.
> > 
> > A log may contain multiple entries, know as a log sequence.  In a log
> > sequence, each log entry immediately follows the previous entry, with an
> > incrementing sequence number.  There can only ever be one active and
> > valid sequence in the log.
> > 
> > Each log entry must match the file log GUID in order to be valid (along
> > with other criteria).  Once we have flushed all valid log entries, we
> > marked the file log GUID to be zero, which indicates a buffer with no
> > valid entries.
> > 
> > Signed-off-by: Jeff Cody <jcody@redhat.com>
> > ---
> >  block/Makefile.objs |   2 +-
> >  block/vhdx-log.c    | 734 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  block/vhdx.c        |  44 +---
> >  block/vhdx.h        |   7 +-
> >  4 files changed, 743 insertions(+), 44 deletions(-)
> >  create mode 100644 block/vhdx-log.c
> > 
> > diff --git a/block/Makefile.objs b/block/Makefile.objs
> > index e6f5d33..2fbd79a 100644
> > --- a/block/Makefile.objs
> > +++ b/block/Makefile.objs
> > @@ -2,7 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
> >  block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
> >  block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
> >  block-obj-y += qed-check.o
> > -block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o
> > +block-obj-$(CONFIG_VHDX) += vhdx.o vhdx-endian.o vhdx-log.o
> >  block-obj-y += parallels.o blkdebug.o blkverify.o
> >  block-obj-y += snapshot.o qapi.o
> >  block-obj-$(CONFIG_WIN32) += raw-win32.o win32-aio.o
> > diff --git a/block/vhdx-log.c b/block/vhdx-log.c
> > new file mode 100644
> > index 0000000..89b9000
> > --- /dev/null
> > +++ b/block/vhdx-log.c
> > @@ -0,0 +1,734 @@
> > +/*
> > + * Block driver for Hyper-V VHDX Images
> > + *
> > + * Copyright (c) 2013 Red Hat, Inc.,
> > + *
> > + * Authors:
> > + *  Jeff Cody <jcody@redhat.com>
> > + *
> > + *  This is based on the "VHDX Format Specification v1.00", published 8/25/2012
> > + *  by Microsoft:
> > + *      https://www.microsoft.com/en-us/download/details.aspx?id=34750
> > + *
> > + * This file covers the functionality of the metadata log writing, parsing, and
> > + * replay.
> > + *
> > + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> > + * See the COPYING.LIB file in the top-level directory.
> > + *
> > + */
> > +#include "qemu-common.h"
> > +#include "block/block_int.h"
> > +#include "qemu/module.h"
> > +#include "block/vhdx.h"
> > +
> > +
> > +typedef struct VHDXLogSequence {
> > +    bool valid;
> > +    uint32_t count;
> > +    VHDXLogEntries log;
> > +    VHDXLogEntryHeader hdr;
> > +} VHDXLogSequence;
> > +
> > +typedef struct VHDXLogDescEntries {
> > +    VHDXLogEntryHeader hdr;
> > +    VHDXLogDescriptor desc[];
> > +} VHDXLogDescEntries;
> > +
> > +
> > +/* Returns true if the GUID is zero */
> > +static bool vhdx_log_guid_is_zero(MSGUID *guid)
> > +{
> > +    int i;
> > +    int ret = 0;
> > +
> > +    /* If either the log guid, or log length is zero,
> > +     * then a replay log is not present */
> > +    for (i = 0; i < sizeof(MSGUID); i++) {
> > +        ret |= ((uint8_t *) guid)[i];
> > +    }
> > +
> > +    return ret == 0;
> > +}
> > +
> > +/* The log located on the disk is circular buffer containing
> > + * sectors of 4096 bytes each.
> > + *
> > + * It is assumed for the read/write functions below that the
> > + * circular buffer scheme uses a 'one sector open' to indicate
> > + * the buffer is full.  Given the validation methods used for each
> > + * sector, this method should be compatible with other methods that
> > + * do not waste a sector.
> > + */
> > +
> > +
> > +/* Allow peeking at the hdr entry at the beginning of the current
> > + * read index, without advancing the read index */
> > +static int vhdx_log_peek_hdr(BlockDriverState *bs, VHDXLogEntries *log,
> > +                             VHDXLogEntryHeader *hdr)
> > +{
> > +    int ret = 0;
> > +    uint64_t offset;
> > +    uint32_t read;
> > +
> > +    assert(hdr != NULL);
> > +
> > +    /* peek is only support on sector boundaries */
> 
> s/support/supported/
>

Thanks

> > +    if (log->read % VHDX_LOG_SECTOR_SIZE) {
> > +        ret = -EFAULT;
> > +        goto exit;
> > +    }
> > +
> > +    read = log->read;
> > +    /* we are guaranteed that a) log sectors are 4096 bytes,
> > +     * and b) the log length is a multiple of 1MB. So, there
> > +     * is always a round number of sectors in the buffer */
> > +    if ((read + sizeof(VHDXLogEntryHeader)) > log->length) {
> > +        read = 0;
> > +    }
> > +
> > +    if (read == log->write) {
> > +        ret = -EINVAL;
> > +        goto exit;
> > +    }
> > +
> > +    offset = log->offset + read;
> > +
> > +    ret = bdrv_pread(bs->file, offset, hdr, sizeof(VHDXLogEntryHeader));
> > +    if (ret < 0) {
> > +        goto exit;
> > +    }
> > +
> > +exit:
> > +    return ret;
> > +}
> > +
> > +/* Index increment for log, based on sector boundaries */
> > +static int vhdx_log_inc_idx(uint32_t idx, uint64_t length)
> > +{
> > +    idx += VHDX_LOG_SECTOR_SIZE;
> > +    /* we are guaranteed that a) log sectors are 4096 bytes,
> > +     * and b) the log length is a multiple of 1MB. So, there
> > +     * is always a round number of sectors in the buffer */
> > +    return idx >= length ? 0 : idx;
> > +}
> > +
> > +
> > +/* Reset the log to empty */
> > +static void vhdx_log_reset(BlockDriverState *bs, BDRVVHDXState *s)
> > +{
> > +    MSGUID guid = { 0 };
> > +    s->log.read = s->log.write = 0;
> > +    /* a log guid of 0 indicates an empty log to any parser of v0
> > +     * VHDX logs */
> > +    vhdx_update_headers(bs, s, false, &guid);
> > +}
> > +
> > +/* Reads num_sectors from the log (all log sectors are 4096 bytes),
> > + * into buffer 'buffer'.  Upon return, *sectors_read will contain
> > + * the number of sectors successfully read.
> > + *
> > + * It is assumed that 'buffer' is already allocated, and of sufficient
> > + * size (i.e. >= 4096*num_sectors).
> > + *
> > + * If 'peek' is true, then the tail (read) pointer for the circular buffer is
> > + * not modified.
> > + *
> > + * 0 is returned on success, -errno otherwise.  */
> > +static int vhdx_log_read_sectors(BlockDriverState *bs, VHDXLogEntries *log,
> > +                                 uint32_t *sectors_read, void *buffer,
> > +                                 uint32_t num_sectors, bool peek)
> > +{
> > +    int ret = 0;
> > +    uint64_t offset;
> > +    uint32_t read;
> > +
> > +    read = log->read;
> > +
> > +    *sectors_read = 0;
> > +    while (num_sectors) {
> > +        if (read == log->write) {
> > +            /* empty */
> > +            break;
> > +        }
> > +        offset = log->offset + read;
> > +
> > +        ret = bdrv_pread(bs->file, offset, buffer, VHDX_LOG_SECTOR_SIZE);
> > +        if (ret < 0) {
> > +            goto exit;
> > +        }
> > +        read = vhdx_log_inc_idx(read, log->length);
> > +
> > +        *sectors_read = *sectors_read + 1;
> > +        num_sectors--;
> > +    }
> > +
> > +exit:
> > +    if (!peek) {
> > +        log->read = read;
> > +    }
> > +    return ret;
> > +}
> > +
> > +/* Validates a log entry header */
> > +static bool vhdx_log_hdr_is_valid(VHDXLogEntries *log, VHDXLogEntryHeader *hdr,
> > +                                  BDRVVHDXState *s)
> > +{
> > +    int valid = false;
> > +
> > +    if (memcmp(&hdr->signature, "loge", 4)) {
> > +        goto exit;
> > +    }
> > +
> > +    /* if the individual entry length is larger than the whole log
> > +     * buffer, that is obviously invalid */
> > +    if (log->length < hdr->entry_length) {
> > +        goto exit;
> > +    }
> > +
> > +    /* length of entire entry must be in units of 4KB (log sector size) */
> > +    if (hdr->entry_length % (VHDX_LOG_SECTOR_SIZE)) {
> > +        goto exit;
> > +    }
> > +
> > +    /* per spec, sequence # must be > 0 */
> > +    if (hdr->sequence_number == 0) {
> > +        goto exit;
> > +    }
> > +
> > +    /* log entries are only valid if they match the file-wide log guid
> > +     * found in the active header */
> > +    if (!guid_eq(hdr->log_guid, s->headers[s->curr_header]->log_guid)) {
> > +        goto exit;
> > +    }
> > +
> > +    valid = true;
> > +
> > +exit:
> > +    return valid;
> > +}
> > +
> > +/*
> > + * Given a log header, this will validate that the descriptors and the
> > + * corresponding data sectors (if applicable)
> > + *
> > + * Validation consists of:
> > + *      1. Making sure the sequence numbers matches the entry header
> > + *      2. Verifying a valid signature ('zero' or desc' for descriptors)
> 
> s/ desc'/ 'desc'/
>

Thanks

> > + *      3. File offset field is a multiple of 4KB
> > + *      4. If a data descriptor, the corresponding data sector
> > + *         has its signature ('data') and matching sequence number
> > + *
> > + * 'desc' is the data buffer containing the descriptor
> > + * hdr is the log entry header
> 
> Please use gtkdoc format:
> 
> @desc: the data buffer ...
> @hdr:  the log entry header
>

Sure, I can do that.

> > + *
> > + * Returns true if valid
> > + */
> > +static bool vhdx_log_desc_is_valid(VHDXLogDescriptor *desc,
> > +                                   VHDXLogEntryHeader *hdr)
> > +{
> > +    bool ret = false;
> > +
> > +    if (desc->sequence_number != hdr->sequence_number) {
> > +        goto exit;
> > +    }
> > +    if (desc->file_offset % VHDX_LOG_SECTOR_SIZE) {
> > +        goto exit;
> > +    }
> > +
> > +    if (!memcmp(&desc->signature, "zero", 4)) {
> > +        if (!desc->zero_length % VHDX_LOG_SECTOR_SIZE) {
> > +            /* valid */
> > +            ret = true;
> > +        }
> > +    } else if (!memcmp(&desc->signature, "desc", 4)) {
> > +            /* valid */
> > +            ret = true;
> > +    }
> > +
> > +exit:
> > +    return ret;
> > +}
> > +
> > +
> > +/* Prior to sector data for a log entry, there is the header
> > + * and the descriptors referenced in the header:
> > + *
> > + * [] = 4KB sector
> > + *
> > + * [ hdr, desc ][   desc   ][ ... ][ data ][ ... ]
> > + *
> > + * The first sector in a log entry has a 64 byte header, and
> > + * up to 126 32-byte descriptors.  If more descriptors than
> > + * 126 are required, then subsequent sectors can have up to 128
> > + * descriptors.  Each sector is 4KB.  Data follows the descriptor
> > + * sectors.
> > + *
> > + * This will return the number of sectors needed to encompass
> > + * the passed number of descriptors in desc_cnt.
> > + *
> > + * This will never return 0, even if desc_cnt is 0.
> > + */
> > +static int vhdx_compute_desc_sectors(uint32_t desc_cnt)
> > +{
> > +    uint32_t desc_sectors;
> > +
> > +    desc_cnt += 2; /* account for header in first sector */
> > +    desc_sectors = desc_cnt / 128;
> > +    if (desc_cnt % 128) {
> > +        desc_sectors++;
> > +    }
> > +
> > +    return desc_sectors;
> > +}
> > +
> > +
> > +/* Reads the log header, and subsequent descriptors (if any).  This
> > + * will allocate all the space for buffer, which must be NULL when
> > + * passed into this function. Each descriptor will also be validated,
> > + * and error returned if any are invalid. */
> > +static int vhdx_log_read_desc(BlockDriverState *bs, BDRVVHDXState *s,
> > +                              VHDXLogEntries *log, VHDXLogDescEntries **buffer)
> > +{
> > +    int ret = 0;
> > +    uint32_t desc_sectors;
> > +    uint32_t sectors_read;
> > +    VHDXLogEntryHeader hdr;
> > +    VHDXLogDescEntries *desc_entries = NULL;
> > +    int i;
> > +
> > +    assert(*buffer == NULL);
> > +
> > +    ret = vhdx_log_peek_hdr(bs, log, &hdr);
> > +    if (ret < 0) {
> > +        goto exit;
> > +    }
> > +    vhdx_log_entry_hdr_le_import(&hdr);
> > +    if (vhdx_log_hdr_is_valid(log, &hdr, s) == false) {
> > +        ret = -EINVAL;
> > +        goto exit;
> > +    }
> > +
> > +    desc_sectors = vhdx_compute_desc_sectors(hdr.descriptor_count);
> > +    desc_entries = qemu_blockalign(bs, desc_sectors * VHDX_LOG_SECTOR_SIZE);
> > +
> > +    ret = vhdx_log_read_sectors(bs, log, &sectors_read, desc_entries,
> > +                                desc_sectors, false);
> > +    if (ret < 0) {
> > +        goto free_and_exit;
> > +    }
> > +    if (sectors_read != desc_sectors) {
> > +        ret = -EINVAL;
> > +        goto free_and_exit;
> > +    }
> > +
> > +    /* put in proper endianness, and validate each desc */
> > +    for (i = 0; i < hdr.descriptor_count; i++) {
> > +        vhdx_log_desc_le_import(&desc_entries->desc[i]);
> > +        if (vhdx_log_desc_is_valid(&desc_entries->desc[i], &hdr) == false) {
> > +            ret = -EINVAL;
> > +            goto free_and_exit;
> > +        }
> > +    }
> > +
> > +    *buffer = desc_entries;
> > +    goto exit;
> > +
> > +free_and_exit:
> > +    qemu_vfree(desc_entries);
> > +exit:
> > +    return ret;
> > +}
> > +
> > +
> > +/* Flushes the descriptor described by desc to the VHDX image file.
> > + * If the descriptor is a data descriptor, than 'data' must be non-NULL,
> > + * and >= 4096 bytes (VHDX_LOG_SECTOR_SIZE), containing the data to be
> > + * written.
> > + *
> > + * Verification is performed to make sure the sequence numbers of a data
> > + * descriptor match the sequence number in the desc.
> > + *
> > + * For a zero descriptor, it may describe multiple sectors to fill with zeroes.
> > + * In this case, it should be noted that zeroes are written to disk, and the
> > + * image file is not extended as a sparse file.  */
> > +static int vhdx_log_flush_desc(BlockDriverState *bs, VHDXLogDescriptor *desc,
> > +                               VHDXLogDataSector *data)
> > +{
> > +    int ret = 0;
> > +    uint64_t seq, file_offset;
> > +    uint32_t offset = 0;
> > +    void *buffer = NULL;
> > +    uint64_t count = 1;
> > +    int i;
> > +
> > +    buffer = qemu_blockalign(bs, VHDX_LOG_SECTOR_SIZE);
> > +
> > +    if (!memcmp(&desc->signature, "desc", 4)) {
> > +        /* data sector */
> > +        if (data == NULL) {
> > +            ret = -EFAULT;
> > +            goto exit;
> > +        }
> > +
> > +        /* The sequence number of the data sector must match that
> > +         * in the descriptor */
> > +        seq = data->sequence_high;
> > +        seq <<= 32;
> > +        seq |= data->sequence_low & 0xffffffff;
> > +
> > +        if (seq != desc->sequence_number) {
> > +            ret = -EINVAL;
> > +            goto exit;
> > +        }
> > +
> > +        /* Each data sector is in total 4096 bytes, however the first
> > +         * 8 bytes, and last 4 bytes, are located in the descriptor */
> > +        memcpy(buffer, &desc->leading_bytes, sizeof(desc->leading_bytes));
> > +        offset += sizeof(desc->leading_bytes);
> > +
> > +        memcpy(buffer+offset, data->data, 4084);
> > +        offset += 4084;
> 
> Could you use sizeof(data->data) instead of 4084?
>

I suppose I should be consistent, and either use the explicit sizes,
or use 'sizeof'.  I was on the fence on whether it is better for these
three variables to use sizeof() or explicit sizes.

> > +
> > +        memcpy(buffer+offset, &desc->trailing_bytes,
> > +               sizeof(desc->trailing_bytes));
> > +
> > +    } else if (!memcmp(&desc->signature, "zero", 4)) {
> > +        /* write 'count' sectors of sector */
> > +        memset(buffer, 0, VHDX_LOG_SECTOR_SIZE);
> > +        count = desc->zero_length / VHDX_LOG_SECTOR_SIZE;
> > +    }
> > +
> > +    file_offset = desc->file_offset;
> > +
> > +    /* count is only > 1 if we are writing zeroes */
> > +    for (i = 0; i < count; i++) {
> > +        ret = bdrv_pwrite_sync(bs->file, file_offset, buffer,
> > +                               VHDX_LOG_SECTOR_SIZE);
> > +        if (ret < 0) {
> > +            goto exit;
> > +        }
> > +        file_offset += VHDX_LOG_SECTOR_SIZE;
> > +    }
> > +
> > +exit:
> > +    qemu_vfree(buffer);
> > +    return ret;
> > +}
> > +
> > +/* Flush the entire log (as described by 'logs') to the VHDX image
> > + * file, and then set the log to 'empty' status once complete.
> > + *
> > + * The log entries should be validate prior to flushing */
> > +static int vhdx_log_flush(BlockDriverState *bs, BDRVVHDXState *s,
> > +                          VHDXLogSequence *logs)
> > +{
> > +    int ret = 0;
> > +    int i;
> > +    uint32_t cnt, sectors_read;
> > +    uint64_t new_file_size;
> > +    void *data = NULL;
> > +    VHDXLogDescEntries *desc_entries = NULL;
> > +    VHDXLogEntryHeader hdr_tmp = { 0 };
> > +
> > +    cnt = logs->count;
> > +
> > +    data = qemu_blockalign(bs, VHDX_LOG_SECTOR_SIZE);
> > +
> > +    vhdx_user_visible_write(bs, s);
> > +
> > +    /* each iteration represents one log sequence, which may span multiple
> > +     * sectors */
> > +    while (cnt--) {
> > +        ret = vhdx_log_peek_hdr(bs, &logs->log, &hdr_tmp);
> > +        if (ret < 0) {
> > +            goto exit;
> > +        }
> > +        /* if the log shows a FlushedFileOffset larger than our current file
> > +         * size, then that means the file has been truncated / corrupted, and
> > +         * we must refused to open it / use it */
> > +        if (hdr_tmp.flushed_file_offset > bdrv_getlength(bs->file)) {
> > +            ret = -EINVAL;
> > +            goto exit;
> > +        }
> > +
> > +        ret = vhdx_log_read_desc(bs, s, &logs->log, &desc_entries);
> > +        if (ret < 0) {
> > +            goto exit;
> > +        }
> > +
> > +        for (i = 0; i < desc_entries->hdr.descriptor_count; i++) {
> > +            if (!memcmp(&desc_entries->desc[i].signature, "desc", 4)) {
> > +                /* data sector, so read a sector to flush */
> > +                ret = vhdx_log_read_sectors(bs, &logs->log, &sectors_read,
> > +                                            data, 1, false);
> > +                if (ret < 0) {
> > +                    goto exit;
> > +                }
> > +                if (sectors_read != 1) {
> > +                    ret = -EINVAL;
> > +                    goto exit;
> > +                }
> > +            }
> > +
> > +            ret = vhdx_log_flush_desc(bs, &desc_entries->desc[i], data);
> > +            if (ret < 0) {
> > +                goto exit;
> > +            }
> > +        }
> > +        if (bdrv_getlength(bs->file) < desc_entries->hdr.last_file_offset) {
> > +            new_file_size = desc_entries->hdr.last_file_offset;
> > +            if (new_file_size % (1024*1024)) {
> > +                /* round up to nearest 1MB boundary */
> > +                new_file_size = ((new_file_size >> 20) + 1) << 20;
> > +                bdrv_truncate(bs->file, new_file_size);
> > +            }
> > +        }
> > +        qemu_vfree(desc_entries);
> > +        desc_entries = NULL;
> > +    }
> > +
> > +    /* once the log is fully flushed, indicate that we have an empty log
> > +     * now.  This also sets the log guid to 0, to indicate an empty log */
> > +    vhdx_log_reset(bs, s);
> > +
> > +exit:
> > +    qemu_vfree(data);
> > +    qemu_vfree(desc_entries);
> > +    return ret;
> > +}
> > +
> > +static int vhdx_validate_log_entry(BlockDriverState *bs, BDRVVHDXState *s,
> > +                                   VHDXLogEntries *log, uint64_t seq,
> > +                                   bool *valid, VHDXLogEntryHeader *entry)
> > +{
> > +    int ret = 0;
> > +    VHDXLogEntryHeader hdr;
> > +    void *buffer = NULL;
> > +    uint32_t i, desc_sectors, total_sectors, crc;
> > +    uint32_t sectors_read = 0;
> > +    VHDXLogDescEntries *desc_buffer = NULL;
> > +
> > +    *valid = false;
> > +
> > +    ret = vhdx_log_peek_hdr(bs, log, &hdr);
> > +    if (ret < 0) {
> > +        goto inc_and_exit;
> > +    }
> > +
> > +    vhdx_log_entry_hdr_le_import(&hdr);
> > +
> > +
> > +    if (vhdx_log_hdr_is_valid(log, &hdr, s) == false) {
> > +        goto inc_and_exit;
> > +    }
> > +
> > +    if (seq > 0) {
> > +        if (hdr.sequence_number != seq + 1) {
> > +            goto inc_and_exit;
> > +        }
> > +    }
> > +
> > +    desc_sectors = vhdx_compute_desc_sectors(hdr.descriptor_count);
> > +
> > +    /* Read desc sectors, and calculate log checksum */
> > +
> > +    total_sectors = hdr.entry_length / VHDX_LOG_SECTOR_SIZE;
> > +
> > +
> > +    /* read_desc() will incrememnt the read idx */
> > +    ret = vhdx_log_read_desc(bs, s, log, &desc_buffer);
> > +    if (ret < 0) {
> > +        goto free_and_exit;
> > +    }
> > +
> > +    crc = vhdx_checksum_calc(0xffffffff, (void *)desc_buffer,
> > +                            desc_sectors * VHDX_LOG_SECTOR_SIZE, 4);
> > +    crc ^= 0xffffffff;
> > +
> > +    buffer = qemu_blockalign(bs, VHDX_LOG_SECTOR_SIZE);
> > +    if (total_sectors > desc_sectors) {
> > +        for (i = 0; i < total_sectors - desc_sectors; i++) {
> > +            sectors_read = 0;
> > +            ret = vhdx_log_read_sectors(bs, log, &sectors_read, buffer,
> > +                                        1, false);
> > +            if (ret < 0 || sectors_read != 1) {
> > +                goto free_and_exit;
> > +            }
> > +            crc = vhdx_checksum_calc(crc, buffer, VHDX_LOG_SECTOR_SIZE, -1);
> > +            crc ^= 0xffffffff;
> > +        }
> > +    }
> > +    crc ^= 0xffffffff;
> > +    if (crc != desc_buffer->hdr.checksum) {
> > +        goto free_and_exit;
> > +    }
> > +
> > +    *valid = true;
> > +    *entry = hdr;
> > +    goto free_and_exit;
> > +
> > +inc_and_exit:
> > +    log->read = vhdx_log_inc_idx(log->read, log->length);
> > +
> > +free_and_exit:
> > +    qemu_vfree(buffer);
> > +    qemu_vfree(desc_buffer);
> > +    return ret;
> > +}
> > +
> > +/* Search through the log circular buffer, and find the valid, active
> > + * log sequence, if any exists
> > + * */
> > +static int vhdx_log_search(BlockDriverState *bs, BDRVVHDXState *s,
> > +                           VHDXLogSequence *logs)
> > +{
> > +    int ret = 0;
> > +
> > +    uint64_t curr_seq = 0;
> > +    VHDXLogSequence candidate = { 0 };
> > +    VHDXLogSequence current = { 0 };
> > +
> > +    uint32_t tail;
> > +    bool seq_valid = false;
> > +    VHDXLogEntryHeader hdr = { 0 };
> > +    VHDXLogEntries curr_log;
> > +
> > +    memcpy(&curr_log, &s->log, sizeof(VHDXLogEntries));
> > +    curr_log.write = curr_log.length;   /* assume log is full */
> > +    curr_log.read = 0;
> > +
> > +
> > +    /* now we will go through the whole log sector by sector, until
> > +     * we find a valid, active log sequence, or reach the end of the
> > +     * log buffer */
> > +    for (;;) {
> > +        tail = curr_log.read;
> > +
> > +        curr_seq = 0;
> > +        memset(&current, 0, sizeof(current));
> > +
> > +        ret = vhdx_validate_log_entry(bs, s, &curr_log, curr_seq,
> > +                                      &seq_valid, &hdr);
> > +        if (ret < 0) {
> > +            goto exit;
> > +        }
> > +
> > +        if (seq_valid) {
> > +            current.valid     = true;
> > +            current.log       = curr_log;
> > +            current.log.read  = tail;
> > +            current.log.write = curr_log.read;
> > +            current.count     = 1;
> > +            current.hdr       = hdr;
> > +
> > +
> > +            for (;;) {
> > +                ret = vhdx_validate_log_entry(bs, s, &curr_log, curr_seq,
> > +                                              &seq_valid, &hdr);
> > +                if (ret < 0) {
> > +                    goto exit;
> > +                }
> > +                if (seq_valid == false) {
> > +                    break;
> > +                }
> > +                current.log.write = curr_log.read;
> > +                current.count++;
> > +
> > +                curr_seq = hdr.sequence_number;
> > +            }
> > +        }
> > +
> > +        if (current.valid) {
> > +            if (candidate.valid == false ||
> > +                current.hdr.sequence_number > candidate.hdr.sequence_number) {
> > +                candidate = current;
> > +            }
> > +        }
> > +
> > +        if (curr_log.read < tail) {
> > +            break;
> > +        }
> > +    }
> > +
> > +    *logs = candidate;
> > +
> > +    if (candidate.valid) {
> > +        /* this is the next sequence number, for writes */
> > +        s->log.sequence = candidate.hdr.sequence_number + 1;
> > +    }
> > +
> > +
> > +exit:
> > +    return ret;
> > +}
> > +
> > +/* Parse the replay log.  Per the VHDX spec, if the log is present
> > + * it must be replayed prior to opening the file, even read-only.
> > + *
> > + * If read-only, we must replay the log in RAM (or refuse to open
> > + * a dirty VHDX file read-only */
> > +int vhdx_parse_log(BlockDriverState *bs, BDRVVHDXState *s)
> > +{
> > +    int ret = 0;
> > +    VHDXHeader *hdr;
> > +    VHDXLogSequence logs = { 0 };
> > +
> > +    hdr = s->headers[s->curr_header];
> > +
> > +    /* s->log.hdr is freed in vhdx_close() */
> > +    if (s->log.hdr == NULL) {
> > +        s->log.hdr = qemu_blockalign(bs, sizeof(VHDXLogEntryHeader));
> > +    }
> > +
> > +    s->log.offset = hdr->log_offset;
> > +    s->log.length = hdr->log_length;
> > +
> > +    if (s->log.offset < VHDX_LOG_MIN_SIZE ||
> > +        s->log.offset % VHDX_LOG_MIN_SIZE) {
> > +        ret = -EINVAL;
> > +        goto exit;
> > +    }
> > +
> > +    /* per spec, only log version of 0 is supported */
> > +    if (hdr->log_version != 0) {
> > +        ret = -EINVAL;
> > +        goto exit;
> > +    }
> > +
> > +    /* If either the log guid, or log length is zero,
> > +     * then a replay log is not present */
> > +    if (vhdx_log_guid_is_zero(&hdr->log_guid)) {
> > +        goto exit;
> > +    }
> > +
> > +
> Too many blank lines here.
> > +
> > +    if (hdr->log_length == 0) {
> > +        goto exit;
> > +    }
> > +
> > +    if (hdr->log_length % VHDX_LOG_MIN_SIZE) {
> > +        ret = -EINVAL;
> > +        goto exit;
> > +    }
> > +
> > +
> > +    /* The log is present, we need to find if and where there is an active
> > +     * sequence of valid entries present in the log.  */
> > +
> > +    ret = vhdx_log_search(bs, s, &logs);
> > +    if (ret < 0) {
> > +        goto exit;
> > +    }
> > +
> > +    if (logs.valid) {
> > +        /* now flush the log */
> > +        ret = vhdx_log_flush(bs, s, &logs);
> 
> Does it flush log regardless of read-only open?
> 

Yes - that will cause an error return to go up the stack once the
write is attempted.

Alternatively, I could return error if the image file has a dirty log,
and is opened read-only (in that case, could return -ENOTSUP).

> > +    }
> > +
> > +
> > +exit:
> > +    return ret;
> > +}
> > +
> > diff --git a/block/vhdx.c b/block/vhdx.c
> > index f5689c3..a8dd6d7 100644
> > --- a/block/vhdx.c
> > +++ b/block/vhdx.c
> > @@ -735,48 +735,6 @@ exit:
> >      return ret;
> >  }
> >  
> > -/* Parse the replay log.  Per the VHDX spec, if the log is present
> > - * it must be replayed prior to opening the file, even read-only.
> > - *
> > - * If read-only, we must replay the log in RAM (or refuse to open
> > - * a dirty VHDX file read-only */
> > -static int vhdx_parse_log(BlockDriverState *bs, BDRVVHDXState *s)
> > -{
> > -    int ret = 0;
> > -    int i;
> > -    VHDXHeader *hdr;
> > -
> > -    hdr = s->headers[s->curr_header];
> > -
> > -    /* either the log guid, or log length is zero,
> > -     * then a replay log is present */
> > -    for (i = 0; i < sizeof(hdr->log_guid.data4); i++) {
> > -        ret |= hdr->log_guid.data4[i];
> > -    }
> > -    if (hdr->log_guid.data1 == 0 &&
> > -        hdr->log_guid.data2 == 0 &&
> > -        hdr->log_guid.data3 == 0 &&
> > -        ret == 0) {
> > -        goto exit;
> > -    }
> > -
> > -    /* per spec, only log version of 0 is supported */
> > -    if (hdr->log_version != 0) {
> > -        ret = -EINVAL;
> > -        goto exit;
> > -    }
> > -
> > -    if (hdr->log_length == 0) {
> > -        goto exit;
> > -    }
> > -
> > -    /* We currently do not support images with logs to replay */
> > -    ret = -ENOTSUP;
> > -
> > -exit:
> > -    return ret;
> > -}
> > -
> >  
> >  static int vhdx_open(BlockDriverState *bs, QDict *options, int flags)
> >  {
> > @@ -789,6 +747,7 @@ static int vhdx_open(BlockDriverState *bs, QDict *options, int flags)
> >  
> >      s->bat = NULL;
> >      s->first_visible_write = true;
> > +    s->log.write = s->log.read = 0;
> >  
> >      qemu_co_mutex_init(&s->lock);
> >  
> > @@ -1030,6 +989,7 @@ static void vhdx_close(BlockDriverState *bs)
> >      qemu_vfree(s->headers[1]);
> >      qemu_vfree(s->bat);
> >      qemu_vfree(s->parent_entries);
> > +    qemu_vfree(s->log.hdr);
> >  }
> >  
> >  static BlockDriver bdrv_vhdx = {
> > diff --git a/block/vhdx.h b/block/vhdx.h
> > index cb3ce0e..24b126e 100644
> > --- a/block/vhdx.h
> > +++ b/block/vhdx.h
> > @@ -326,7 +326,11 @@ typedef struct VHDXMetadataEntries {
> >  typedef struct VHDXLogEntries {
> >      uint64_t offset;
> >      uint64_t length;
> > -    uint32_t head;
> > +    uint32_t write;
> > +    uint32_t read;
> > +    VHDXLogEntryHeader *hdr;
> > +    void *desc_buffer;
> > +    uint64_t sequence;
> >      uint32_t tail;
> >  } VHDXLogEntries;
> >  
> > @@ -387,6 +391,7 @@ uint32_t vhdx_checksum_calc(uint32_t crc, uint8_t *buf, size_t size,
> >  
> >  bool vhdx_checksum_is_valid(uint8_t *buf, size_t size, int crc_offset);
> >  
> > +int vhdx_parse_log(BlockDriverState *bs, BDRVVHDXState *s);
> >  
> >  static inline void leguid_to_cpus(MSGUID *guid)
> >  {
> > -- 
> > 1.8.1.4
> > 
> > 
> 
> -- 
> Fam

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [Qemu-devel] [PATCH 8/9] block: vhdx - add log write support
  2013-07-30  3:57   ` Fam Zheng
@ 2013-07-30 14:11     ` Jeff Cody
  0 siblings, 0 replies; 19+ messages in thread
From: Jeff Cody @ 2013-07-30 14:11 UTC (permalink / raw)
  To: Fam Zheng; +Cc: kwolf, qemu-devel, stefanha

On Tue, Jul 30, 2013 at 11:57:20AM +0800, Fam Zheng wrote:
> On Wed, 07/24 13:54, Jeff Cody wrote:
> > This adds support for writing to the VHDX log.
> > 
> > For spec details, see VHDX Specification Format v1.00:
> > https://www.microsoft.com/en-us/download/details.aspx?id=34750
> > 
> > There are a few limitations to this log support:
> > 1.) There is no caching yet
> > 2.) The log is flushed after each entry
> > 
> > The primary write interface, vhdx_log_write_and_flush(), performs a log
> > write followed by an immediate flush of the log.
> > 
> > As each log entry sector is a minimum of 4KB, partial sector writes are
> > filled in with data from the disk write destination.
> > 
> > If the current file log GUID is 0, a new GUID is generated and updated
> > in the header.
> > 
> > Signed-off-by: Jeff Cody <jcody@redhat.com>
> > ---
> >  block/vhdx-log.c | 273 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  block/vhdx.h     |   3 +
> >  2 files changed, 276 insertions(+)
> > 
> > diff --git a/block/vhdx-log.c b/block/vhdx-log.c
> > index 89b9000..786b393 100644
> > --- a/block/vhdx-log.c
> > +++ b/block/vhdx-log.c
> > @@ -170,6 +170,53 @@ exit:
> >      return ret;
> >  }
> >  
> > +/* Writes num_sectors to the log (all log sectors are 4096 bytes),
> > + * from buffer 'buffer'.  Upon return, *sectors_written will contain
> > + * the number of sectors successfully written.
> > + *
> > + * It is assumed that 'buffer' is at least 4096*num_sectors large.
> > + *
> > + * 0 is returned on success, -errno otherwise */
> > +static int vhdx_log_write_sectors(BlockDriverState *bs, VHDXLogEntries *log,
> > +                                  uint32_t *sectors_written, void *buffer,
> > +                                  uint32_t num_sectors)
> > +{
> > +    int ret = 0;
> > +    uint64_t offset;
> > +    uint32_t write;
> > +    void *buffer_tmp;
> > +    BDRVVHDXState *s = bs->opaque;
> > +
> > +    vhdx_user_visible_write(bs, s);
> > +
> > +    write = log->write;
> > +
> > +    buffer_tmp = buffer;
> > +    while (num_sectors) {
> > +
> > +        offset = log->offset + write;
> > +        write = vhdx_log_inc_idx(write, log->length);
> > +        if (write == log->read) {
> > +            /* full */
> > +            break;
> > +        }
> > +        ret = bdrv_pwrite_sync(bs->file, offset, buffer_tmp,
> > +                               VHDX_LOG_SECTOR_SIZE);
> > +        if (ret < 0) {
> > +            goto exit;
> > +        }
> > +        buffer_tmp += VHDX_LOG_SECTOR_SIZE;
> > +
> > +        log->write = write;
> > +        *sectors_written = *sectors_written + 1;
> > +        num_sectors--;
> > +    }
> > +
> > +exit:
> > +    return ret;
> > +}
> > +
> > +
> >  /* Validates a log entry header */
> >  static bool vhdx_log_hdr_is_valid(VHDXLogEntries *log, VHDXLogEntryHeader *hdr,
> >                                    BDRVVHDXState *s)
> > @@ -732,3 +779,229 @@ exit:
> >      return ret;
> >  }
> >  
> > +
> > +
> > +static void vhdx_log_raw_to_le_sector(VHDXLogDescriptor *desc,
> > +                                      VHDXLogDataSector *sector, void *data,
> > +                                      uint64_t seq)
> > +{
> > +    memcpy(&desc->leading_bytes, data, 8);
> > +    data += 8;
> > +    cpu_to_le64s(&desc->leading_bytes);
> > +    memcpy(sector->data, data, 4084);
> > +    data += 4084;
> > +    memcpy(&desc->trailing_bytes, data, 4);
> > +    cpu_to_le32s(&desc->trailing_bytes);
> > +    data += 4;
> > +
> > +    sector->sequence_high  = (uint32_t) (seq >> 32);
> > +    sector->sequence_low   = (uint32_t) (seq & 0xffffffff);
> > +    sector->data_signature = VHDX_LOG_DATA_SIGNATURE;
> > +
> > +    vhdx_log_desc_le_export(desc);
> > +    vhdx_log_data_le_export(sector);
> > +}
> > +
> > +
> > +static int vhdx_log_write(BlockDriverState *bs, BDRVVHDXState *s,
> > +                          void *data, uint32_t length, uint64_t offset)
> > +{
> > +    int ret = 0;
> > +    void *buffer = NULL;
> > +    void *merged_sector = NULL;
> > +    void *data_tmp, *sector_write;
> > +    unsigned int i;
> > +    int sector_offset;
> > +    uint32_t desc_sectors, sectors, total_length;
> > +    uint32_t sectors_written = 0;
> > +    uint32_t aligned_length;
> > +    uint32_t leading_length = 0;
> > +    uint32_t trailing_length = 0;
> > +    uint32_t partial_sectors = 0;
> > +    uint32_t bytes_written = 0;
> > +    uint64_t file_offset;
> > +    VHDXHeader *header;
> > +    VHDXLogEntryHeader new_hdr;
> > +    VHDXLogDescriptor *new_desc = NULL;
> > +    VHDXLogDataSector *data_sector = NULL;
> > +    MSGUID new_guid = { 0 };
> > +
> > +    header = s->headers[s->curr_header];
> > +
> > +    /* need to have offset read data, and be on 4096 byte boundary */
> > +
> > +    if (length > header->log_length) {
> > +        /* no log present.  we could create a log here instead of failing */
> 
> Does newly created vhdx have allocated log sectors?
> 

I don't know of any way to make Hyper-V create a file without an
allocated log area (I believe with the files I've generated, it
allocates a 1MB log between the header and the BAT region).

The spec says that "LogLength" in the header should be a multiple of
1MB.  And technically, 0 is a multiple of every number, so when
parsing the header I don't fail out on a zero-lengthed log.  In
practice, I don't think Hyper-V creates files with zero-length logs,
but I don't think the spec rules it out.

So we could either allocate a log in the file at this point, or fail.

> > +        ret = -EINVAL;
> > +        goto exit;
> > +    }
> > +
> > +    if (vhdx_log_guid_is_zero(&header->log_guid)) {
> > +        vhdx_guid_generate(&new_guid);
> > +        vhdx_update_headers(bs, s, false, &new_guid);
> > +    } else {
> > +        /* currently, we require that the log be flushed after
> > +         * every write. */
> > +        ret = -ENOTSUP;
> 
> Can we make an assertion here?
>

I don't know if we should assert here - the VM could certainly
continue on if this is not the primary drive.

> > +    }
> > +
> > +    /* 0 is an invalid sequence number, but may also represent the first
> > +     * log write (or a wrapped seq) */
> > +    if (s->log.sequence == 0) {
> > +        s->log.sequence = 1;
> > +    }
> > +
> > +    sector_offset = offset % VHDX_LOG_SECTOR_SIZE;
> > +    file_offset = (offset / VHDX_LOG_SECTOR_SIZE) * VHDX_LOG_SECTOR_SIZE;
> > +
> > +    aligned_length = length;
> > +
> > +    /* add in the unaligned head and tail bytes */
> > +    if (sector_offset) {
> > +        leading_length = (VHDX_LOG_SECTOR_SIZE - sector_offset);
> > +        leading_length = leading_length > length ? length : leading_length;
> > +        aligned_length -= leading_length;
> > +        partial_sectors++;
> > +    }
> > +
> > +    sectors = aligned_length / VHDX_LOG_SECTOR_SIZE;
> > +    trailing_length = aligned_length - (sectors * VHDX_LOG_SECTOR_SIZE);
> > +    if (trailing_length) {
> > +        partial_sectors++;
> > +    }
> > +
> > +    sectors += partial_sectors;
> > +
> > +    /* sectors is now how many sectors the data itself takes, not
> > +     * including the header and descriptor metadata */
> > +
> > +    new_hdr = (VHDXLogEntryHeader) {
> > +                .signature           = VHDX_LOG_SIGNATURE,
> > +                .tail                = s->log.tail,
> > +                .sequence_number     = s->log.sequence,
> > +                .descriptor_count    = sectors,
> > +                .reserved            = 0,
> > +                .flushed_file_offset = bdrv_getlength(bs->file),
> > +                .last_file_offset    = bdrv_getlength(bs->file),
> > +              };
> > +
> > +    memcpy(&new_hdr.log_guid, &header->log_guid, sizeof(MSGUID));
> > +
> > +    desc_sectors = vhdx_compute_desc_sectors(new_hdr.descriptor_count);
> > +
> > +    total_length = (desc_sectors + sectors) * VHDX_LOG_SECTOR_SIZE;
> > +    new_hdr.entry_length = total_length;
> > +
> > +    vhdx_log_entry_hdr_le_export(&new_hdr);
> > +
> > +    buffer = qemu_blockalign(bs, total_length);
> > +    memcpy(buffer, &new_hdr, sizeof(new_hdr));
> > +
> > +    new_desc = (VHDXLogDescriptor *) (buffer + sizeof(new_hdr));
> > +    data_sector = buffer + (desc_sectors * VHDX_LOG_SECTOR_SIZE);
> > +    data_tmp = data;
> > +
> > +    /* All log sectors are 4KB, so for any partial sectors we must
> > +     * merge the data with preexisting data from the final file
> > +     * destination */
> > +    merged_sector = qemu_blockalign(bs, VHDX_LOG_SECTOR_SIZE);
> > +
> > +    for (i = 0; i < sectors; i++) {
> > +        new_desc->signature       = VHDX_LOG_DESC_SIGNATURE;
> > +        new_desc->sequence_number = s->log.sequence;
> > +        new_desc->file_offset     = file_offset;
> > +
> > +        if (i == 0 && leading_length) {
> > +            /* partial sector at the front of the buffer */
> > +            ret = bdrv_pread(bs->file, file_offset, merged_sector,
> > +                             VHDX_LOG_SECTOR_SIZE);
> > +            if (ret < 0) {
> > +                goto exit;
> > +            }
> > +            memcpy(merged_sector + sector_offset, data_tmp, leading_length);
> > +            bytes_written = leading_length;
> > +            sector_write = merged_sector;
> > +        } else if (i == sectors - 1 && trailing_length) {
> > +            /* partial sector at the end of the buffer */
> > +            ret = bdrv_pread(bs->file,
> > +                            file_offset,
> > +                            merged_sector + trailing_length,
> > +                            VHDX_LOG_SECTOR_SIZE - trailing_length);
> > +            if (ret < 0) {
> > +                goto exit;
> > +            }
> > +            memcpy(merged_sector, data_tmp, trailing_length);
> > +            bytes_written = trailing_length;
> > +            sector_write = merged_sector;
> > +        } else {
> > +            bytes_written = VHDX_LOG_SECTOR_SIZE;
> > +            sector_write = data_tmp;
> > +        }
> > +
> > +        /* populate the raw sector data into the proper structures,
> > +         * as well as update the descriptor, and convert to proper
> > +         * endianness */
> > +        vhdx_log_raw_to_le_sector(new_desc, data_sector, sector_write,
> > +                                  s->log.sequence);
> > +
> > +        data_tmp += bytes_written;
> > +        data_sector++;
> > +        new_desc++;
> > +        file_offset += VHDX_LOG_SECTOR_SIZE;
> > +    }
> > +
> > +    /* checksum covers entire entry, from the log header through the
> > +     * last data sector */
> > +    vhdx_update_checksum(buffer, total_length, 4);
> > +    cpu_to_le32s((uint32_t *)(buffer + 4));
> > +
> > +    /* now write to the log */
> > +    vhdx_log_write_sectors(bs, &s->log, &sectors_written, buffer,
> > +                           desc_sectors + sectors);
> > +    if (ret < 0) {
> > +        goto exit;
> > +    }
> > +
> > +    if (sectors_written != desc_sectors + sectors) {
> > +        /* instead of failing, we could flush the log here */
> > +        ret = -EINVAL;
> > +        goto exit;
> > +    }
> > +
> > +    s->log.sequence++;
> > +    /* write new tail */
> > +    s->log.tail = s->log.write;
> > +
> > +exit:
> > +    qemu_vfree(buffer);
> > +    qemu_vfree(merged_sector);
> > +    return ret;
> > +}
> > +
> > +/* Perform a log write, and then immediately flush the entire log */
> > +int vhdx_log_write_and_flush(BlockDriverState *bs, BDRVVHDXState *s,
> > +                             void *data, uint32_t length, uint64_t offset)
> > +{
> > +    int ret = 0;
> > +    VHDXLogSequence logs = { .valid = true,
> > +                             .count = 1,
> > +                             .hdr = { 0 } };
> > +
> > +
> > +    ret = vhdx_log_write(bs, s, data, length, offset);
> > +    if (ret < 0) {
> > +        goto exit;
> > +    }
> > +    logs.log = s->log;
> > +
> > +    ret = vhdx_log_flush(bs, s, &logs);
> > +    if (ret < 0) {
> > +        goto exit;
> > +    }
> > +
> > +    s->log = logs.log;
> > +
> > +exit:
> > +    return ret;
> > +}
> > +
> > diff --git a/block/vhdx.h b/block/vhdx.h
> > index 24b126e..b210efc 100644
> > --- a/block/vhdx.h
> > +++ b/block/vhdx.h
> > @@ -393,6 +393,9 @@ bool vhdx_checksum_is_valid(uint8_t *buf, size_t size, int crc_offset);
> >  
> >  int vhdx_parse_log(BlockDriverState *bs, BDRVVHDXState *s);
> >  
> > +int vhdx_log_write_and_flush(BlockDriverState *bs, BDRVVHDXState *s,
> > +                             void *data, uint32_t length, uint64_t offset);
> > +
> >  static inline void leguid_to_cpus(MSGUID *guid)
> >  {
> >      le32_to_cpus(&guid->data1);
> > -- 
> > 1.8.1.4
> > 
> > 
> 
> -- 
> Fam

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2013-07-30 14:11 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-24 17:54 [Qemu-devel] [PATCH 0/9] VHDX log replay and write support Jeff Cody
2013-07-24 17:54 ` [Qemu-devel] [PATCH 1/9] block: vhdx - minor comments and typo correction Jeff Cody
2013-07-24 17:54 ` [Qemu-devel] [PATCH 2/9] block: vhdx - add header update capability Jeff Cody
2013-07-26  6:49   ` Fam Zheng
2013-07-26 11:39     ` Jeff Cody
2013-07-24 17:54 ` [Qemu-devel] [PATCH 3/9] block: vhdx code movement - VHDXMetadataEntries and BDRVVHDXState to header Jeff Cody
2013-07-24 17:54 ` [Qemu-devel] [PATCH 4/9] block: vhdx - log support struct and defines Jeff Cody
2013-07-30  3:15   ` Fam Zheng
2013-07-30 13:42     ` Jeff Cody
2013-07-24 17:54 ` [Qemu-devel] [PATCH 5/9] block: vhdx - break endian translation functions out Jeff Cody
2013-07-24 17:54 ` [Qemu-devel] [PATCH 6/9] block: vhdx - update log guid in header, and first write tracker Jeff Cody
2013-07-24 17:54 ` [Qemu-devel] [PATCH 7/9] block: vhdx - log parsing, replay, and flush support Jeff Cody
2013-07-30  3:48   ` Fam Zheng
2013-07-30 13:58     ` Jeff Cody
2013-07-24 17:54 ` [Qemu-devel] [PATCH 8/9] block: vhdx - add log write support Jeff Cody
2013-07-30  3:57   ` Fam Zheng
2013-07-30 14:11     ` Jeff Cody
2013-07-24 17:54 ` [Qemu-devel] [PATCH 9/9] block: vhdx " Jeff Cody
2013-07-30  4:10   ` Fam Zheng

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.