All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 00/10] block/dmg: (compatibility) fixes and bzip2 support
@ 2014-12-27 15:01 Peter Wu
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 01/10] block/dmg: properly detect the UDIF trailer Peter Wu
                   ` (10 more replies)
  0 siblings, 11 replies; 39+ messages in thread
From: Peter Wu @ 2014-12-27 15:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi

Hi,

These series improve QEMU support for DMG image files:

 - Images which are not multiples of a block size (512) are now properly
   recognized ("properly detect the UDIF trailer").
 - Block descriptors stored in the XML plist section rather than the Resource
   Fork are now recognized ("process XML plists").
 - The virtual (uncompressed) image size is now advertised ("set virtual size to
   a non-zero value").
 - Files which have a different data offsets are now properly handled ("fix
   sector data offset calculation").
 - bzip2-compressed chunks can now be read ("support bzip2 block entry types").
   This adds a new --enable-bzip2 option.
 - Files with a large all-zeroes data chunk do not trip an error anymore
   ("improve zeroes handling").

The XML plist patch depends on these refactorings (which add no new
functionality):

  block/dmg: extract mish block decoding functionality
  block/dmg: extract processing of resource forks
  block/dmg: process a buffer instead of reading ints

Finally there is this patch which avoids a DoS (large memory allocation) due to
a missing check on untrusted data:

  block/dmg: validate chunk size to avoid overflow

These patches were tested against disk images, but since DMG files are also used
for installers, I used those too. The following files pass these patches:
 - filename + source; real size in bytes; publication date. (additional info:
   whether BLXX is stored in XML plists or a resource fork; compression method)
 - espgs-7.05.5-0.ppc.dmg[1]            5593786 2002-09 (rsrc fork, zlib, sector
                                                         offsets start at 0)
 - Wireshark 1.12.2 Intel 64.dmg[2]    26375047 2014-11 (xml, zlib)
 - tuxpaint-0.9.15-macosx.dmg[3]        9022458 2005-11 (rsrc fork, zlib)
 - vlc-2.1.5.dmg[4]                    33519849 2014-07 (xml, bzlib)
 - OS X Yosemite.dmg                 5189100314 2014-10 (xml, bzlib)
 - InstallESD.dmg (in prev. image)   5164763151 2014-10 (xml, zlib; data offset
                                                         is non-zero)

For these above files, I executed `qemu-img info foo.dmg` to verify that there is
no EINVAL error and `qemu-img convert -f dmg -O raw foo.dmg foo.raw` to check
whether reading works. The resulting `foo.raw` was then compared against
`foo.img` as generated by `dmg2img foo.dmg`.

This file was tested and 'qemu-img info' works properly, but conversion failed
because the Apple Data Compression (ADC) format is not supported[5].
 - NetBoot9.dmg[6]                    534884900 2003-09 (rsrc fork, adc)

These patches (rebased against current master) can be found at:
https://git.lekensteyn.nl/peter/qemu/log/?h=block-dmg-2.3
(development and testing was done against v2.2.0, branch block-dmg-2.2)

Kind regards,
Peter

 [1]: http://sourceforge.net/projects/espgs/files/espgs/7.05.5/
 [2]: https://www.wireshark.org/download/osx/
 [3]: http://sourceforge.net/projects/tuxpaint/files/tuxpaint/0.9.15/
 [4]: http://download.videolan.org/pub/videolan/vlc/2.1.5/macosx/
 [5]: https://bugzilla.redhat.com/show_bug.cgi?id=1058132
 [6]: http://download.info.apple.com/Mac_OS_X/693-4445.20030912.gnr39/NetBoot9.dmg
--
Peter Wu (10):
  block/dmg: properly detect the UDIF trailer
  block/dmg: extract mish block decoding functionality
  block/dmg: extract processing of resource forks
  block/dmg: process a buffer instead of reading ints
  block/dmg: validate chunk size to avoid overflow
  block/dmg: process XML plists
  block/dmg: set virtual size to a non-zero value
  block/dmg: fix sector data offset calculation
  block/dmg: support bzip2 block entry types
  block/dmg: improve zeroes handling

 block/dmg.c | 475 +++++++++++++++++++++++++++++++++++++++++++++---------------
 configure   |  31 ++++
 2 files changed, 390 insertions(+), 116 deletions(-)

-- 
2.2.1

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 01/10] block/dmg: properly detect the UDIF trailer
  2014-12-27 15:01 [Qemu-devel] [PATCH 00/10] block/dmg: (compatibility) fixes and bzip2 support Peter Wu
@ 2014-12-27 15:01 ` Peter Wu
  2015-01-02 23:58   ` John Snow
  2015-01-06 13:35   ` Stefan Hajnoczi
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 02/10] block/dmg: extract mish block decoding functionality Peter Wu
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 39+ messages in thread
From: Peter Wu @ 2014-12-27 15:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi

DMG files have a variable length with a UDIF trailer at the end of a
file. This UDIF trailer is essential as it describes the contents of
the image. At the moment however, the start of this trailer is almost
always incorrect as bdrv_getlength() returns a multiple of the block
size (rounded up). This results in a failure to recognize DMG files,
resulting in Invalid argument (EINVAL) errors.

As there is no API to retrieve the real file size, look for the magic
header in the last two sectors to find the start of this 512-byte UDIF
trailer (the "koly" block).

The resource fork offset ("info_begin") has its offset adjusted as the
initial value of offset does not mean "end of file" anymore, but "begin
of UDIF trailer".

Signed-off-by: Peter Wu <peter@lekensteyn.nl>
---
 block/dmg.c | 40 ++++++++++++++++++++++++++++++++++++----
 1 file changed, 36 insertions(+), 4 deletions(-)

diff --git a/block/dmg.c b/block/dmg.c
index e455886..df274f9 100644
--- a/block/dmg.c
+++ b/block/dmg.c
@@ -131,6 +131,39 @@ static void update_max_chunk_size(BDRVDMGState *s, uint32_t chunk,
     }
 }
 
+static int64_t dmg_find_koly_offset(BlockDriverState *file_bs)
+{
+    int64_t length;
+    int64_t offset = 0;
+    uint8_t buffer[515];
+    int i, ret;
+
+    /* bdrv_getlength returns a multiple of block size (512), rounded up. Since
+     * dmg images can have odd sizes, try to look for the "koly" magic which
+     * marks the begin of the UDIF trailer (512 bytes). This magic can be found
+     * in the last 511 bytes of the second-last sector or the first 4 bytes of
+     * the last sector (search space: 515 bytes) */
+    length = bdrv_getlength(file_bs);
+    if (length < 512) {
+        return length < 0 ? length : -EINVAL;
+    }
+    if (length > 511 + 512) {
+        offset = length - 511 - 512;
+    }
+    length = length < 515 ? length : 515;
+    ret = bdrv_pread(file_bs, offset, buffer, length);
+    if (ret < 4) {
+        return ret < 0 ? ret : -EINVAL;
+    }
+    for (i = 0; i < length - 3; i++) {
+        if (buffer[i] == 'k' && buffer[i+1] == 'o' &&
+            buffer[i+2] == 'l' && buffer[i+3] == 'y') {
+            return offset + i;
+        }
+    }
+    return -EINVAL;
+}
+
 static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
                     Error **errp)
 {
@@ -145,15 +178,14 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
     s->n_chunks = 0;
     s->offsets = s->lengths = s->sectors = s->sectorcounts = NULL;
 
-    /* read offset of info blocks */
-    offset = bdrv_getlength(bs->file);
+    /* locate the UDIF trailer */
+    offset = dmg_find_koly_offset(bs->file);
     if (offset < 0) {
         ret = offset;
         goto fail;
     }
-    offset -= 0x1d8;
 
-    ret = read_uint64(bs, offset, &info_begin);
+    ret = read_uint64(bs, offset + 0x28, &info_begin);
     if (ret < 0) {
         goto fail;
     } else if (info_begin == 0) {
-- 
2.2.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 02/10] block/dmg: extract mish block decoding functionality
  2014-12-27 15:01 [Qemu-devel] [PATCH 00/10] block/dmg: (compatibility) fixes and bzip2 support Peter Wu
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 01/10] block/dmg: properly detect the UDIF trailer Peter Wu
@ 2014-12-27 15:01 ` Peter Wu
  2015-01-02 23:59   ` John Snow
  2015-01-06 13:42   ` Stefan Hajnoczi
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 03/10] block/dmg: extract processing of resource forks Peter Wu
                   ` (8 subsequent siblings)
  10 siblings, 2 replies; 39+ messages in thread
From: Peter Wu @ 2014-12-27 15:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi

Extract the mish block decoder such that this can be used for other
formats in the future. A new DmgHeaderState struct is introduced to
share state while decoding.

The code is kept unchanged as much as possible, a "fail" label is added
for example where a simple return would probably do.

Signed-off-by: Peter Wu <peter@lekensteyn.nl>
---
 block/dmg.c | 216 +++++++++++++++++++++++++++++++++++-------------------------
 1 file changed, 125 insertions(+), 91 deletions(-)

diff --git a/block/dmg.c b/block/dmg.c
index df274f9..6dc6dbb 100644
--- a/block/dmg.c
+++ b/block/dmg.c
@@ -164,19 +164,137 @@ static int64_t dmg_find_koly_offset(BlockDriverState *file_bs)
     return -EINVAL;
 }
 
+/* used when building the sector table */
+typedef struct DmgHeaderState {
+    /* used internally by dmg_read_mish_block to remember offsets of blocks
+     * across calls */
+    uint64_t last_in_offset;
+    uint64_t last_out_offset;
+    /* exported for dmg_open */
+    uint32_t max_compressed_size;
+    uint32_t max_sectors_per_chunk;
+} DmgHeaderState;
+
+static int dmg_read_mish_block(BlockDriverState *bs, DmgHeaderState *ds,
+                               int64_t offset, uint32_t count)
+{
+    BDRVDMGState *s = bs->opaque;
+    uint32_t type, i;
+    int ret;
+    size_t new_size;
+    uint32_t chunk_count;
+
+    ret = read_uint32(bs, offset, &type);
+    if (ret < 0) {
+        goto fail;
+    }
+
+    /* skip data that is not a valid MISH block (invalid magic or too small) */
+    if (type != 0x6d697368 || count < 244) {
+        /* assume success for now */
+        return 0;
+    }
+
+    offset += 4;
+    offset += 200;
+
+    chunk_count = (count - 204) / 40;
+    new_size = sizeof(uint64_t) * (s->n_chunks + chunk_count);
+    s->types = g_realloc(s->types, new_size / 2);
+    s->offsets = g_realloc(s->offsets, new_size);
+    s->lengths = g_realloc(s->lengths, new_size);
+    s->sectors = g_realloc(s->sectors, new_size);
+    s->sectorcounts = g_realloc(s->sectorcounts, new_size);
+
+    for (i = s->n_chunks; i < s->n_chunks + chunk_count; i++) {
+        ret = read_uint32(bs, offset, &s->types[i]);
+        if (ret < 0) {
+            goto fail;
+        }
+        offset += 4;
+        if (s->types[i] != 0x80000005 && s->types[i] != 1 &&
+            s->types[i] != 2) {
+            if (s->types[i] == 0xffffffff && i > 0) {
+                ds->last_in_offset = s->offsets[i - 1] + s->lengths[i - 1];
+                ds->last_out_offset = s->sectors[i - 1] +
+                                      s->sectorcounts[i - 1];
+            }
+            chunk_count--;
+            i--;
+            offset += 36;
+            continue;
+        }
+        offset += 4;
+
+        ret = read_uint64(bs, offset, &s->sectors[i]);
+        if (ret < 0) {
+            goto fail;
+        }
+        s->sectors[i] += ds->last_out_offset;
+        offset += 8;
+
+        ret = read_uint64(bs, offset, &s->sectorcounts[i]);
+        if (ret < 0) {
+            goto fail;
+        }
+        offset += 8;
+
+        if (s->sectorcounts[i] > DMG_SECTORCOUNTS_MAX) {
+            error_report("sector count %" PRIu64 " for chunk %" PRIu32
+                         " is larger than max (%u)",
+                         s->sectorcounts[i], i, DMG_SECTORCOUNTS_MAX);
+            ret = -EINVAL;
+            goto fail;
+        }
+
+        ret = read_uint64(bs, offset, &s->offsets[i]);
+        if (ret < 0) {
+            goto fail;
+        }
+        s->offsets[i] += ds->last_in_offset;
+        offset += 8;
+
+        ret = read_uint64(bs, offset, &s->lengths[i]);
+        if (ret < 0) {
+            goto fail;
+        }
+        offset += 8;
+
+        if (s->lengths[i] > DMG_LENGTHS_MAX) {
+            error_report("length %" PRIu64 " for chunk %" PRIu32
+                         " is larger than max (%u)",
+                         s->lengths[i], i, DMG_LENGTHS_MAX);
+            ret = -EINVAL;
+            goto fail;
+        }
+
+        update_max_chunk_size(s, i, &ds->max_compressed_size,
+                              &ds->max_sectors_per_chunk);
+    }
+    s->n_chunks += chunk_count;
+    return 0;
+
+fail:
+    return ret;
+}
+
 static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
                     Error **errp)
 {
     BDRVDMGState *s = bs->opaque;
-    uint64_t info_begin, info_end, last_in_offset, last_out_offset;
+    DmgHeaderState ds;
+    uint64_t info_begin, info_end;
     uint32_t count, tmp;
-    uint32_t max_compressed_size = 1, max_sectors_per_chunk = 1, i;
     int64_t offset;
     int ret;
 
     bs->read_only = 1;
     s->n_chunks = 0;
     s->offsets = s->lengths = s->sectors = s->sectorcounts = NULL;
+    ds.last_in_offset = 0;
+    ds.last_out_offset = 0;
+    ds.max_compressed_size = 1;
+    ds.max_sectors_per_chunk = 1;
 
     /* locate the UDIF trailer */
     offset = dmg_find_koly_offset(bs->file);
@@ -210,13 +328,11 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
     }
     info_end = info_begin + count;
 
+    /* begin of mish block */
     offset = info_begin + 0x100;
 
     /* read offsets */
-    last_in_offset = last_out_offset = 0;
     while (offset < info_end) {
-        uint32_t type;
-
         ret = read_uint32(bs, offset, &count);
         if (ret < 0) {
             goto fail;
@@ -226,100 +342,18 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
         }
         offset += 4;
 
-        ret = read_uint32(bs, offset, &type);
+        ret = dmg_read_mish_block(bs, &ds, offset, count);
         if (ret < 0) {
             goto fail;
         }
-
-        if (type == 0x6d697368 && count >= 244) {
-            size_t new_size;
-            uint32_t chunk_count;
-
-            offset += 4;
-            offset += 200;
-
-            chunk_count = (count - 204) / 40;
-            new_size = sizeof(uint64_t) * (s->n_chunks + chunk_count);
-            s->types = g_realloc(s->types, new_size / 2);
-            s->offsets = g_realloc(s->offsets, new_size);
-            s->lengths = g_realloc(s->lengths, new_size);
-            s->sectors = g_realloc(s->sectors, new_size);
-            s->sectorcounts = g_realloc(s->sectorcounts, new_size);
-
-            for (i = s->n_chunks; i < s->n_chunks + chunk_count; i++) {
-                ret = read_uint32(bs, offset, &s->types[i]);
-                if (ret < 0) {
-                    goto fail;
-                }
-                offset += 4;
-                if (s->types[i] != 0x80000005 && s->types[i] != 1 &&
-                    s->types[i] != 2) {
-                    if (s->types[i] == 0xffffffff && i > 0) {
-                        last_in_offset = s->offsets[i - 1] + s->lengths[i - 1];
-                        last_out_offset = s->sectors[i - 1] +
-                                          s->sectorcounts[i - 1];
-                    }
-                    chunk_count--;
-                    i--;
-                    offset += 36;
-                    continue;
-                }
-                offset += 4;
-
-                ret = read_uint64(bs, offset, &s->sectors[i]);
-                if (ret < 0) {
-                    goto fail;
-                }
-                s->sectors[i] += last_out_offset;
-                offset += 8;
-
-                ret = read_uint64(bs, offset, &s->sectorcounts[i]);
-                if (ret < 0) {
-                    goto fail;
-                }
-                offset += 8;
-
-                if (s->sectorcounts[i] > DMG_SECTORCOUNTS_MAX) {
-                    error_report("sector count %" PRIu64 " for chunk %" PRIu32
-                                 " is larger than max (%u)",
-                                 s->sectorcounts[i], i, DMG_SECTORCOUNTS_MAX);
-                    ret = -EINVAL;
-                    goto fail;
-                }
-
-                ret = read_uint64(bs, offset, &s->offsets[i]);
-                if (ret < 0) {
-                    goto fail;
-                }
-                s->offsets[i] += last_in_offset;
-                offset += 8;
-
-                ret = read_uint64(bs, offset, &s->lengths[i]);
-                if (ret < 0) {
-                    goto fail;
-                }
-                offset += 8;
-
-                if (s->lengths[i] > DMG_LENGTHS_MAX) {
-                    error_report("length %" PRIu64 " for chunk %" PRIu32
-                                 " is larger than max (%u)",
-                                 s->lengths[i], i, DMG_LENGTHS_MAX);
-                    ret = -EINVAL;
-                    goto fail;
-                }
-
-                update_max_chunk_size(s, i, &max_compressed_size,
-                                      &max_sectors_per_chunk);
-            }
-            s->n_chunks += chunk_count;
-        }
+        offset += count;
     }
 
     /* initialize zlib engine */
     s->compressed_chunk = qemu_try_blockalign(bs->file,
-                                              max_compressed_size + 1);
+                                              ds.max_compressed_size + 1);
     s->uncompressed_chunk = qemu_try_blockalign(bs->file,
-                                                512 * max_sectors_per_chunk);
+                                                512 * ds.max_sectors_per_chunk);
     if (s->compressed_chunk == NULL || s->uncompressed_chunk == NULL) {
         ret = -ENOMEM;
         goto fail;
-- 
2.2.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 03/10] block/dmg: extract processing of resource forks
  2014-12-27 15:01 [Qemu-devel] [PATCH 00/10] block/dmg: (compatibility) fixes and bzip2 support Peter Wu
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 01/10] block/dmg: properly detect the UDIF trailer Peter Wu
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 02/10] block/dmg: extract mish block decoding functionality Peter Wu
@ 2014-12-27 15:01 ` Peter Wu
  2015-01-03  0:01   ` John Snow
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 04/10] block/dmg: process a buffer instead of reading ints Peter Wu
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 39+ messages in thread
From: Peter Wu @ 2014-12-27 15:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi

Besides the offset, also read the resource length. This length is now
used in the extracted function to verify the end of the resource fork
against "count" from the resource fork.

Signed-off-by: Peter Wu <peter@lekensteyn.nl>
---
 block/dmg.c | 90 ++++++++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 59 insertions(+), 31 deletions(-)

diff --git a/block/dmg.c b/block/dmg.c
index 6dc6dbb..7f49388 100644
--- a/block/dmg.c
+++ b/block/dmg.c
@@ -278,38 +278,13 @@ fail:
     return ret;
 }
 
-static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
-                    Error **errp)
+static int dmg_read_resource_fork(BlockDriverState *bs, DmgHeaderState *ds,
+                                  uint64_t info_begin, uint64_t info_length)
 {
-    BDRVDMGState *s = bs->opaque;
-    DmgHeaderState ds;
-    uint64_t info_begin, info_end;
-    uint32_t count, tmp;
-    int64_t offset;
     int ret;
-
-    bs->read_only = 1;
-    s->n_chunks = 0;
-    s->offsets = s->lengths = s->sectors = s->sectorcounts = NULL;
-    ds.last_in_offset = 0;
-    ds.last_out_offset = 0;
-    ds.max_compressed_size = 1;
-    ds.max_sectors_per_chunk = 1;
-
-    /* locate the UDIF trailer */
-    offset = dmg_find_koly_offset(bs->file);
-    if (offset < 0) {
-        ret = offset;
-        goto fail;
-    }
-
-    ret = read_uint64(bs, offset + 0x28, &info_begin);
-    if (ret < 0) {
-        goto fail;
-    } else if (info_begin == 0) {
-        ret = -EINVAL;
-        goto fail;
-    }
+    uint32_t count, tmp;
+    uint64_t info_end;
+    uint64_t offset;
 
     ret = read_uint32(bs, info_begin, &tmp);
     if (ret < 0) {
@@ -326,6 +301,10 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
         ret = -EINVAL;
         goto fail;
     }
+    if (count > info_length) {
+        ret = -EINVAL;
+        goto fail;
+    }
     info_end = info_begin + count;
 
     /* begin of mish block */
@@ -342,12 +321,61 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
         }
         offset += 4;
 
-        ret = dmg_read_mish_block(bs, &ds, offset, count);
+        ret = dmg_read_mish_block(bs, ds, offset, count);
         if (ret < 0) {
             goto fail;
         }
         offset += count;
     }
+    return 0;
+
+fail:
+    return ret;
+}
+
+static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
+                    Error **errp)
+{
+    BDRVDMGState *s = bs->opaque;
+    DmgHeaderState ds;
+    uint64_t rsrc_fork_offset, rsrc_fork_length;
+    int64_t offset;
+    int ret;
+
+    bs->read_only = 1;
+    s->n_chunks = 0;
+    s->offsets = s->lengths = s->sectors = s->sectorcounts = NULL;
+    ds.last_in_offset = 0;
+    ds.last_out_offset = 0;
+    ds.max_compressed_size = 1;
+    ds.max_sectors_per_chunk = 1;
+
+    /* locate the UDIF trailer */
+    offset = dmg_find_koly_offset(bs->file);
+    if (offset < 0) {
+        ret = offset;
+        goto fail;
+    }
+
+    /* offset of resource fork (RsrcForkOffset) */
+    ret = read_uint64(bs, offset + 0x28, &rsrc_fork_offset);
+    if (ret < 0) {
+        goto fail;
+    }
+    ret = read_uint64(bs, offset + 0x30, &rsrc_fork_length);
+    if (ret < 0) {
+        goto fail;
+    }
+    if (rsrc_fork_offset != 0 && rsrc_fork_length != 0) {
+        ret = dmg_read_resource_fork(bs, &ds,
+                                     rsrc_fork_offset, rsrc_fork_length);
+        if (ret < 0) {
+            goto fail;
+        }
+    } else {
+        ret = -EINVAL;
+        goto fail;
+    }
 
     /* initialize zlib engine */
     s->compressed_chunk = qemu_try_blockalign(bs->file,
-- 
2.2.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 04/10] block/dmg: process a buffer instead of reading ints
  2014-12-27 15:01 [Qemu-devel] [PATCH 00/10] block/dmg: (compatibility) fixes and bzip2 support Peter Wu
                   ` (2 preceding siblings ...)
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 03/10] block/dmg: extract processing of resource forks Peter Wu
@ 2014-12-27 15:01 ` Peter Wu
  2015-01-03  0:01   ` John Snow
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 05/10] block/dmg: validate chunk size to avoid overflow Peter Wu
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 39+ messages in thread
From: Peter Wu @ 2014-12-27 15:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi

As the decoded plist XML is not a pointer in the file,
dmg_read_mish_block must be able to process a buffer instead of a file
pointer. Since the full buffer must be processed, let's change the
return value again to just a success flag.

Signed-off-by: Peter Wu <peter@lekensteyn.nl>
---
 block/dmg.c | 60 ++++++++++++++++++++++++++++++------------------------------
 1 file changed, 30 insertions(+), 30 deletions(-)

diff --git a/block/dmg.c b/block/dmg.c
index 7f49388..75e771a 100644
--- a/block/dmg.c
+++ b/block/dmg.c
@@ -100,6 +100,16 @@ static int read_uint32(BlockDriverState *bs, int64_t offset, uint32_t *result)
     return 0;
 }
 
+static inline uint64_t buff_read_uint64(const uint8_t *buffer, int64_t offset)
+{
+    return be64_to_cpu(*(uint64_t *)&buffer[offset]);
+}
+
+static inline uint32_t buff_read_uint32(const uint8_t *buffer, int64_t offset)
+{
+    return be32_to_cpu(*(uint32_t *)&buffer[offset]);
+}
+
 /* Increase max chunk sizes, if necessary.  This function is used to calculate
  * the buffer sizes needed for compressed/uncompressed chunk I/O.
  */
@@ -175,20 +185,16 @@ typedef struct DmgHeaderState {
     uint32_t max_sectors_per_chunk;
 } DmgHeaderState;
 
-static int dmg_read_mish_block(BlockDriverState *bs, DmgHeaderState *ds,
-                               int64_t offset, uint32_t count)
+static int dmg_read_mish_block(BDRVDMGState *s, DmgHeaderState *ds,
+                               uint8_t *buffer, uint32_t count)
 {
-    BDRVDMGState *s = bs->opaque;
     uint32_t type, i;
     int ret;
     size_t new_size;
     uint32_t chunk_count;
+    int64_t offset = 0;
 
-    ret = read_uint32(bs, offset, &type);
-    if (ret < 0) {
-        goto fail;
-    }
-
+    type = buff_read_uint32(buffer, offset);
     /* skip data that is not a valid MISH block (invalid magic or too small) */
     if (type != 0x6d697368 || count < 244) {
         /* assume success for now */
@@ -207,10 +213,7 @@ static int dmg_read_mish_block(BlockDriverState *bs, DmgHeaderState *ds,
     s->sectorcounts = g_realloc(s->sectorcounts, new_size);
 
     for (i = s->n_chunks; i < s->n_chunks + chunk_count; i++) {
-        ret = read_uint32(bs, offset, &s->types[i]);
-        if (ret < 0) {
-            goto fail;
-        }
+        s->types[i] = buff_read_uint32(buffer, offset);
         offset += 4;
         if (s->types[i] != 0x80000005 && s->types[i] != 1 &&
             s->types[i] != 2) {
@@ -226,17 +229,11 @@ static int dmg_read_mish_block(BlockDriverState *bs, DmgHeaderState *ds,
         }
         offset += 4;
 
-        ret = read_uint64(bs, offset, &s->sectors[i]);
-        if (ret < 0) {
-            goto fail;
-        }
+        s->sectors[i] = buff_read_uint64(buffer, offset);
         s->sectors[i] += ds->last_out_offset;
         offset += 8;
 
-        ret = read_uint64(bs, offset, &s->sectorcounts[i]);
-        if (ret < 0) {
-            goto fail;
-        }
+        s->sectorcounts[i] = buff_read_uint64(buffer, offset);
         offset += 8;
 
         if (s->sectorcounts[i] > DMG_SECTORCOUNTS_MAX) {
@@ -247,17 +244,11 @@ static int dmg_read_mish_block(BlockDriverState *bs, DmgHeaderState *ds,
             goto fail;
         }
 
-        ret = read_uint64(bs, offset, &s->offsets[i]);
-        if (ret < 0) {
-            goto fail;
-        }
+        s->offsets[i] = buff_read_uint64(buffer, offset);
         s->offsets[i] += ds->last_in_offset;
         offset += 8;
 
-        ret = read_uint64(bs, offset, &s->lengths[i]);
-        if (ret < 0) {
-            goto fail;
-        }
+        s->lengths[i] = buff_read_uint64(buffer, offset);
         offset += 8;
 
         if (s->lengths[i] > DMG_LENGTHS_MAX) {
@@ -281,8 +272,10 @@ fail:
 static int dmg_read_resource_fork(BlockDriverState *bs, DmgHeaderState *ds,
                                   uint64_t info_begin, uint64_t info_length)
 {
+    BDRVDMGState *s = bs->opaque;
     int ret;
     uint32_t count, tmp;
+    uint8_t *buffer = NULL;
     uint64_t info_end;
     uint64_t offset;
 
@@ -321,15 +314,22 @@ static int dmg_read_resource_fork(BlockDriverState *bs, DmgHeaderState *ds,
         }
         offset += 4;
 
-        ret = dmg_read_mish_block(bs, ds, offset, count);
+        buffer = g_realloc(buffer, count);
+        ret = bdrv_pread(bs->file, offset, buffer, count);
+        if (ret < 0) {
+            goto fail;
+        }
+
+        ret = dmg_read_mish_block(s, ds, buffer, count);
         if (ret < 0) {
             goto fail;
         }
         offset += count;
     }
-    return 0;
+    ret = 0;
 
 fail:
+    g_free(buffer);
     return ret;
 }
 
-- 
2.2.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 05/10] block/dmg: validate chunk size to avoid overflow
  2014-12-27 15:01 [Qemu-devel] [PATCH 00/10] block/dmg: (compatibility) fixes and bzip2 support Peter Wu
                   ` (3 preceding siblings ...)
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 04/10] block/dmg: process a buffer instead of reading ints Peter Wu
@ 2014-12-27 15:01 ` Peter Wu
  2015-01-03  0:02   ` John Snow
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 06/10] block/dmg: process XML plists Peter Wu
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 39+ messages in thread
From: Peter Wu @ 2014-12-27 15:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi

Previously the chunk size was not checked, allowing for a large memory
allocation. This patch checks whether the chunks size is within the
resource fork length.

Signed-off-by: Peter Wu <peter@lekensteyn.nl>
---
 block/dmg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/dmg.c b/block/dmg.c
index 75e771a..19e4fe2 100644
--- a/block/dmg.c
+++ b/block/dmg.c
@@ -308,7 +308,7 @@ static int dmg_read_resource_fork(BlockDriverState *bs, DmgHeaderState *ds,
         ret = read_uint32(bs, offset, &count);
         if (ret < 0) {
             goto fail;
-        } else if (count == 0) {
+        } else if (count == 0 || count > info_end - offset) {
             ret = -EINVAL;
             goto fail;
         }
-- 
2.2.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 06/10] block/dmg: process XML plists
  2014-12-27 15:01 [Qemu-devel] [PATCH 00/10] block/dmg: (compatibility) fixes and bzip2 support Peter Wu
                   ` (4 preceding siblings ...)
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 05/10] block/dmg: validate chunk size to avoid overflow Peter Wu
@ 2014-12-27 15:01 ` Peter Wu
  2015-01-03  0:04   ` John Snow
  2015-01-05 16:54   ` John Snow
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 07/10] block/dmg: set virtual size to a non-zero value Peter Wu
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 39+ messages in thread
From: Peter Wu @ 2014-12-27 15:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi

The format is simple enough to avoid using a full-blown XML parser.
The offsets are based on the description at
http://newosxbook.com/DMG.html

Signed-off-by: Peter Wu <peter@lekensteyn.nl>
---
 block/dmg.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 69 insertions(+)

diff --git a/block/dmg.c b/block/dmg.c
index 19e4fe2..c03ea01 100644
--- a/block/dmg.c
+++ b/block/dmg.c
@@ -26,6 +26,7 @@
 #include "qemu/bswap.h"
 #include "qemu/module.h"
 #include <zlib.h>
+#include <glib.h>
 
 enum {
     /* Limit chunk sizes to prevent unreasonable amounts of memory being used
@@ -333,12 +334,66 @@ fail:
     return ret;
 }
 
+static int dmg_read_plist_xml(BlockDriverState *bs, DmgHeaderState *ds,
+                              uint64_t info_begin, uint64_t info_length)
+{
+    BDRVDMGState *s = bs->opaque;
+    int ret;
+    uint8_t *buffer = NULL;
+    char *data_begin, *data_end;
+
+    /* Have at least some length to avoid NULL for g_malloc. Attempt to set a
+     * safe upper cap on the data length. A test sample had a XML length of
+     * about 1 MiB. */
+    if (info_length == 0 || info_length > 16 * 1024 * 1024) {
+        ret = -EINVAL;
+        goto fail;
+    }
+
+    buffer = g_malloc(info_length + 1);
+    buffer[info_length] = '\0';
+    ret = bdrv_pread(bs->file, info_begin, buffer, info_length);
+    if (ret != info_length) {
+        ret = -EINVAL;
+        goto fail;
+    }
+
+    /* look for <data>...</data>. The data is 284 (0x11c) bytes after base64
+     * decode. The actual data element has 431 (0x1af) bytes which includes tabs
+     * and line feeds. */
+    data_end = (char *)buffer;
+    while ((data_begin = strstr(data_end, "<data>")) != NULL) {
+        gsize out_len = 0;
+
+        data_begin += 6;
+        data_end = strstr(data_begin, "</data>");
+        /* malformed XML? */
+        if (data_end == NULL) {
+            ret = -EINVAL;
+            goto fail;
+        }
+        *data_end++ = '\0';
+        g_base64_decode_inplace(data_begin, &out_len);
+        ret = dmg_read_mish_block(s, ds, (uint8_t *)data_begin,
+                                  (uint32_t)out_len);
+        if (ret < 0) {
+            goto fail;
+        }
+    }
+    ret = 0;
+
+fail:
+    g_free(buffer);
+    return ret;
+}
+
 static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
                     Error **errp)
 {
     BDRVDMGState *s = bs->opaque;
     DmgHeaderState ds;
     uint64_t rsrc_fork_offset, rsrc_fork_length;
+    uint64_t plist_xml_offset, plist_xml_length;
     int64_t offset;
     int ret;
 
@@ -366,12 +421,26 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
     if (ret < 0) {
         goto fail;
     }
+    /* offset of property list (XMLOffset) */
+    ret = read_uint64(bs, offset + 0xd8, &plist_xml_offset);
+    if (ret < 0) {
+        goto fail;
+    }
+    ret = read_uint64(bs, offset + 0xe0, &plist_xml_length);
+    if (ret < 0) {
+        goto fail;
+    }
     if (rsrc_fork_offset != 0 && rsrc_fork_length != 0) {
         ret = dmg_read_resource_fork(bs, &ds,
                                      rsrc_fork_offset, rsrc_fork_length);
         if (ret < 0) {
             goto fail;
         }
+    } else if (plist_xml_offset != 0 && plist_xml_length != 0) {
+        ret = dmg_read_plist_xml(bs, &ds, plist_xml_offset, plist_xml_length);
+        if (ret < 0) {
+            goto fail;
+        }
     } else {
         ret = -EINVAL;
         goto fail;
-- 
2.2.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 07/10] block/dmg: set virtual size to a non-zero value
  2014-12-27 15:01 [Qemu-devel] [PATCH 00/10] block/dmg: (compatibility) fixes and bzip2 support Peter Wu
                   ` (5 preceding siblings ...)
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 06/10] block/dmg: process XML plists Peter Wu
@ 2014-12-27 15:01 ` Peter Wu
  2015-01-03  0:04   ` John Snow
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 08/10] block/dmg: fix sector data offset calculation Peter Wu
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 39+ messages in thread
From: Peter Wu @ 2014-12-27 15:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi

Right now the virtual size is always reported as zero which makes it
impossible to convert between formats.

After this patch, the number of sectors will be read from the BLXX
("mish") header.

To verify the behavior, the output of `dmg2img foo.dmg foo.img` was
compared against `qemu-img convert -f dmg -O raw foo.dmg foo.raw`. The
tests showed that the file contents are exactly the same, except that
QEMU creates a slightly larger file (it matches the total sectors
count).

Signed-off-by: Peter Wu <peter@lekensteyn.nl>
---
 block/dmg.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/block/dmg.c b/block/dmg.c
index c03ea01..984997f 100644
--- a/block/dmg.c
+++ b/block/dmg.c
@@ -430,6 +430,14 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
     if (ret < 0) {
         goto fail;
     }
+    ret = read_uint64(bs, offset + 0x1ec, (uint64_t *)&bs->total_sectors);
+    if (ret < 0) {
+        goto fail;
+    }
+    if (bs->total_sectors < 0) {
+        ret = -EINVAL;
+        goto fail;
+    }
     if (rsrc_fork_offset != 0 && rsrc_fork_length != 0) {
         ret = dmg_read_resource_fork(bs, &ds,
                                      rsrc_fork_offset, rsrc_fork_length);
-- 
2.2.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 08/10] block/dmg: fix sector data offset calculation
  2014-12-27 15:01 [Qemu-devel] [PATCH 00/10] block/dmg: (compatibility) fixes and bzip2 support Peter Wu
                   ` (6 preceding siblings ...)
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 07/10] block/dmg: set virtual size to a non-zero value Peter Wu
@ 2014-12-27 15:01 ` Peter Wu
  2015-01-03  0:05   ` John Snow
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 09/10] block/dmg: support bzip2 block entry types Peter Wu
                   ` (2 subsequent siblings)
  10 siblings, 1 reply; 39+ messages in thread
From: Peter Wu @ 2014-12-27 15:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi

This patch addresses two issues:

 - The data fork offset was not taken into account, resulting in failure
   to read an InstallESD.dmg file (5164763151 bytes) which had a
   non-zero DataForkOffset field.
 - The offset of the previous block ("partition") was unconditionally
   added to the current block because older files would start the input
   offset of a new block at zero. Newer files (including vlc-2.1.5.dmg,
   tuxpaint-0.9.15-macosx.dmg and OS X Yosemite [MAS].dmg) failed in
   reads because these files have a continuous input offset.

Signed-off-by: Peter Wu <peter@lekensteyn.nl>
---
 block/dmg.c | 16 +++++++++++++++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/block/dmg.c b/block/dmg.c
index 984997f..93b597f 100644
--- a/block/dmg.c
+++ b/block/dmg.c
@@ -179,6 +179,7 @@ static int64_t dmg_find_koly_offset(BlockDriverState *file_bs)
 typedef struct DmgHeaderState {
     /* used internally by dmg_read_mish_block to remember offsets of blocks
      * across calls */
+    uint64_t data_fork_offset;
     uint64_t last_in_offset;
     uint64_t last_out_offset;
     /* exported for dmg_open */
@@ -194,6 +195,7 @@ static int dmg_read_mish_block(BDRVDMGState *s, DmgHeaderState *ds,
     size_t new_size;
     uint32_t chunk_count;
     int64_t offset = 0;
+    uint64_t in_offset = ds->data_fork_offset;
 
     type = buff_read_uint32(buffer, offset);
     /* skip data that is not a valid MISH block (invalid magic or too small) */
@@ -246,7 +248,12 @@ static int dmg_read_mish_block(BDRVDMGState *s, DmgHeaderState *ds,
         }
 
         s->offsets[i] = buff_read_uint64(buffer, offset);
-        s->offsets[i] += ds->last_in_offset;
+        /* If this offset is below the previous chunk end, then assume that all
+         * following offsets are after the previous chunks. */
+        if (s->offsets[i] + in_offset < ds->last_in_offset) {
+            in_offset = ds->last_in_offset;
+        }
+        s->offsets[i] += in_offset;
         offset += 8;
 
         s->lengths[i] = buff_read_uint64(buffer, offset);
@@ -400,6 +407,7 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
     bs->read_only = 1;
     s->n_chunks = 0;
     s->offsets = s->lengths = s->sectors = s->sectorcounts = NULL;
+    ds.data_fork_offset = 0;
     ds.last_in_offset = 0;
     ds.last_out_offset = 0;
     ds.max_compressed_size = 1;
@@ -412,6 +420,12 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
         goto fail;
     }
 
+    /* offset of data fork (DataForkOffset) */
+    ret = read_uint64(bs, offset + 0x18, &ds.data_fork_offset);
+    if (ret < 0) {
+        goto fail;
+    }
+
     /* offset of resource fork (RsrcForkOffset) */
     ret = read_uint64(bs, offset + 0x28, &rsrc_fork_offset);
     if (ret < 0) {
-- 
2.2.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 09/10] block/dmg: support bzip2 block entry types
  2014-12-27 15:01 [Qemu-devel] [PATCH 00/10] block/dmg: (compatibility) fixes and bzip2 support Peter Wu
                   ` (7 preceding siblings ...)
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 08/10] block/dmg: fix sector data offset calculation Peter Wu
@ 2014-12-27 15:01 ` Peter Wu
  2015-01-05 19:32   ` John Snow
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 10/10] block/dmg: improve zeroes handling Peter Wu
  2015-01-02 14:14 ` [Qemu-devel] [PATCH 00/10] block/dmg: (compatibility) fixes and bzip2 support Stefan Hajnoczi
  10 siblings, 1 reply; 39+ messages in thread
From: Peter Wu @ 2014-12-27 15:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi

This patch adds support for bzip2-compressed block entries as introduced
with OS X 10.4 (source: https://en.wikipedia.org/wiki/Apple_Disk_Image).

It was tested against a 5.2G "OS X Yosemite" installation image which
stores the BLXX block in the XML property list (instead of resource
forks) and has over 5k chunks.

New configure entries are added (--enable-bzip2 / --disable-bzip2) to
control inclusion of bzip2 functionality (which requires linking against
libbz2). The help message suggests that this option is needed for DMG
files, but the tests are generic enough that other parts of QEMU can use
bzip2 if needed.

The identifiers are based on http://newosxbook.com/DMG.html.

The decompression routines are based on the zlib case, but as there is
no way to reset the decompression state (unlike zlib), memory is
allocated and deallocated for every decompression. This should not be
problematic as the decompression takes most of the time and as blocks
are typically about/over 1 MiB in size, only one allocation is done
every 2000 sectors.

Signed-off-by: Peter Wu <peter@lekensteyn.nl>
---
 block/dmg.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++---
 configure   | 31 +++++++++++++++++++++++++++++++
 2 files changed, 84 insertions(+), 3 deletions(-)

diff --git a/block/dmg.c b/block/dmg.c
index 93b597f..67d4e2b 100644
--- a/block/dmg.c
+++ b/block/dmg.c
@@ -26,6 +26,9 @@
 #include "qemu/bswap.h"
 #include "qemu/module.h"
 #include <zlib.h>
+#ifdef CONFIG_BZIP2
+#include <bzlib.h>
+#endif
 #include <glib.h>
 
 enum {
@@ -56,6 +59,9 @@ typedef struct BDRVDMGState {
     uint8_t *compressed_chunk;
     uint8_t *uncompressed_chunk;
     z_stream zstream;
+#ifdef CONFIG_BZIP2
+    bz_stream bzstream;
+#endif
 } BDRVDMGState;
 
 static int dmg_probe(const uint8_t *buf, int buf_size, const char *filename)
@@ -123,6 +129,7 @@ static void update_max_chunk_size(BDRVDMGState *s, uint32_t chunk,
 
     switch (s->types[chunk]) {
     case 0x80000005: /* zlib compressed */
+    case 0x80000006: /* bzip2 compressed */
         compressed_size = s->lengths[chunk];
         uncompressed_sectors = s->sectorcounts[chunk];
         break;
@@ -187,6 +194,21 @@ typedef struct DmgHeaderState {
     uint32_t max_sectors_per_chunk;
 } DmgHeaderState;
 
+static bool dmg_is_valid_block_type(uint32_t entry_type)
+{
+    switch (entry_type) {
+    case 0x00000001:    /* uncompressed */
+    case 0x00000002:    /* zeroes */
+    case 0x80000005:    /* zlib */
+#ifdef CONFIG_BZIP2
+    case 0x80000006:    /* bzip2 */
+#endif
+        return true;
+    default:
+        return false;
+    }
+}
+
 static int dmg_read_mish_block(BDRVDMGState *s, DmgHeaderState *ds,
                                uint8_t *buffer, uint32_t count)
 {
@@ -218,8 +240,7 @@ static int dmg_read_mish_block(BDRVDMGState *s, DmgHeaderState *ds,
     for (i = s->n_chunks; i < s->n_chunks + chunk_count; i++) {
         s->types[i] = buff_read_uint32(buffer, offset);
         offset += 4;
-        if (s->types[i] != 0x80000005 && s->types[i] != 1 &&
-            s->types[i] != 2) {
+        if (!dmg_is_valid_block_type(s->types[i])) {
             if (s->types[i] == 0xffffffff && i > 0) {
                 ds->last_in_offset = s->offsets[i - 1] + s->lengths[i - 1];
                 ds->last_out_offset = s->sectors[i - 1] +
@@ -534,13 +555,14 @@ static inline int dmg_read_chunk(BlockDriverState *bs, uint64_t sector_num)
     if (!is_sector_in_chunk(s, s->current_chunk, sector_num)) {
         int ret;
         uint32_t chunk = search_chunk(s, sector_num);
+        uint64_t total_out;
 
         if (chunk >= s->n_chunks) {
             return -1;
         }
 
         s->current_chunk = s->n_chunks;
-        switch (s->types[chunk]) {
+        switch (s->types[chunk]) { /* block entry type */
         case 0x80000005: { /* zlib compressed */
             /* we need to buffer, because only the chunk as whole can be
              * inflated. */
@@ -564,6 +586,34 @@ static inline int dmg_read_chunk(BlockDriverState *bs, uint64_t sector_num)
                 return -1;
             }
             break; }
+#ifdef CONFIG_BZIP2
+        case 0x80000006: /* bzip2 compressed */
+            /* we need to buffer, because only the chunk as whole can be
+             * inflated. */
+            ret = bdrv_pread(bs->file, s->offsets[chunk],
+                             s->compressed_chunk, s->lengths[chunk]);
+            if (ret != s->lengths[chunk]) {
+                return -1;
+            }
+
+            ret = BZ2_bzDecompressInit(&s->bzstream, 0, 0);
+            if (ret != BZ_OK) {
+                return -1;
+            }
+            s->bzstream.next_in = (char *)s->compressed_chunk;
+            s->bzstream.avail_in = (unsigned int) s->lengths[chunk];
+            s->bzstream.next_out = (char *)s->uncompressed_chunk;
+            s->bzstream.avail_out = (unsigned int) 512 * s->sectorcounts[chunk];
+            ret = BZ2_bzDecompress(&s->bzstream);
+            total_out = ((uint64_t)s->bzstream.total_out_hi32 << 32) +
+                        s->bzstream.total_out_lo32;
+            BZ2_bzDecompressEnd(&s->bzstream);
+            if (ret != BZ_STREAM_END ||
+                total_out != 512 * s->sectorcounts[chunk]) {
+                return -1;
+            }
+            break;
+#endif /* CONFIG_BZIP2 */
         case 1: /* copy */
             ret = bdrv_pread(bs->file, s->offsets[chunk],
                              s->uncompressed_chunk, s->lengths[chunk]);
diff --git a/configure b/configure
index cae588c..eadae63 100755
--- a/configure
+++ b/configure
@@ -313,6 +313,7 @@ glx=""
 zlib="yes"
 lzo=""
 snappy=""
+bzip2=""
 guest_agent=""
 guest_agent_with_vss="no"
 vss_win32_sdk=""
@@ -1060,6 +1061,10 @@ for opt do
   ;;
   --enable-snappy) snappy="yes"
   ;;
+  --disable-bzip2) bzip2="no"
+  ;;
+  --enable-bzip2) bzip2="yes"
+  ;;
   --enable-guest-agent) guest_agent="yes"
   ;;
   --disable-guest-agent) guest_agent="no"
@@ -1374,6 +1379,8 @@ Advanced options (experts only):
   --enable-usb-redir       enable usb network redirection support
   --enable-lzo             enable the support of lzo compression library
   --enable-snappy          enable the support of snappy compression library
+  --enable-bzip2           enable the support of bzip2 compression library (for
+                           reading bzip2-compressed dmg images)
   --disable-guest-agent    disable building of the QEMU Guest Agent
   --enable-guest-agent     enable building of the QEMU Guest Agent
   --with-vss-sdk=SDK-path  enable Windows VSS support in QEMU Guest Agent
@@ -1820,6 +1827,25 @@ EOF
 fi
 
 ##########################################
+# bzip2 check
+
+if test "$bzip2" != "no" ; then
+    cat > $TMPC << EOF
+#include <bzlib.h>
+int main(void) { BZ2_bzlibVersion(); return 0; }
+EOF
+    if compile_prog "" "-lbz2" ; then
+        libs_softmmu="$libs_softmmu -lbz2"
+        bzip2="yes"
+    else
+        if test "$bzip2" = "yes"; then
+            feature_not_found "libbzip2" "Install libbzip2 devel"
+        fi
+        bzip2="no"
+    fi
+fi
+
+##########################################
 # libseccomp check
 
 if test "$seccomp" != "no" ; then
@@ -4340,6 +4366,7 @@ echo "vhdx              $vhdx"
 echo "Quorum            $quorum"
 echo "lzo support       $lzo"
 echo "snappy support    $snappy"
+echo "bzip2 support     $bzip2"
 echo "NUMA host support $numa"
 
 if test "$sdl_too_old" = "yes"; then
@@ -4695,6 +4722,10 @@ if test "$snappy" = "yes" ; then
   echo "CONFIG_SNAPPY=y" >> $config_host_mak
 fi
 
+if test "$bzip2" = "yes" ; then
+  echo "CONFIG_BZIP2=y" >> $config_host_mak
+fi
+
 if test "$libiscsi" = "yes" ; then
   echo "CONFIG_LIBISCSI=m" >> $config_host_mak
   echo "LIBISCSI_CFLAGS=$libiscsi_cflags" >> $config_host_mak
-- 
2.2.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* [Qemu-devel] [PATCH 10/10] block/dmg: improve zeroes handling
  2014-12-27 15:01 [Qemu-devel] [PATCH 00/10] block/dmg: (compatibility) fixes and bzip2 support Peter Wu
                   ` (8 preceding siblings ...)
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 09/10] block/dmg: support bzip2 block entry types Peter Wu
@ 2014-12-27 15:01 ` Peter Wu
  2015-01-05 19:48   ` John Snow
  2015-01-02 14:14 ` [Qemu-devel] [PATCH 00/10] block/dmg: (compatibility) fixes and bzip2 support Stefan Hajnoczi
  10 siblings, 1 reply; 39+ messages in thread
From: Peter Wu @ 2014-12-27 15:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi

Disk images may contain large all-zeroes gaps (1.66k sectors or 812 MiB
is seen in the real world). These blocks (type 2) do not need to be
extracted into a temporary buffer, there is no need to allocate memory
for these blocks nor to check its length.

(For the test image, the maximum uncompressed size is 1054371 bytes,
probably for a bzip2-compressed block.)

Signed-off-by: Peter Wu <peter@lekensteyn.nl>
---
 block/dmg.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/block/dmg.c b/block/dmg.c
index 67d4e2b..b84ba7e 100644
--- a/block/dmg.c
+++ b/block/dmg.c
@@ -137,7 +137,9 @@ static void update_max_chunk_size(BDRVDMGState *s, uint32_t chunk,
         uncompressed_sectors = (s->lengths[chunk] + 511) / 512;
         break;
     case 2: /* zero */
-        uncompressed_sectors = s->sectorcounts[chunk];
+        /* as the all-zeroes block may be large, it is treated specially: the
+         * sector is not copied from a large buffer, a simple memset is used
+         * instead. Therefore uncompressed_sectors does not need to be set. */
         break;
     }
 
@@ -260,7 +262,9 @@ static int dmg_read_mish_block(BDRVDMGState *s, DmgHeaderState *ds,
         s->sectorcounts[i] = buff_read_uint64(buffer, offset);
         offset += 8;
 
-        if (s->sectorcounts[i] > DMG_SECTORCOUNTS_MAX) {
+        /* all-zeroes sector (type 2) does not need to be "uncompressed" and can
+         * therefore be unbounded. */
+        if (s->types[i] != 2 && s->sectorcounts[i] > DMG_SECTORCOUNTS_MAX) {
             error_report("sector count %" PRIu64 " for chunk %" PRIu32
                          " is larger than max (%u)",
                          s->sectorcounts[i], i, DMG_SECTORCOUNTS_MAX);
@@ -621,9 +625,6 @@ static inline int dmg_read_chunk(BlockDriverState *bs, uint64_t sector_num)
                 return -1;
             }
             break;
-        case 2: /* zero */
-            memset(s->uncompressed_chunk, 0, 512 * s->sectorcounts[chunk]);
-            break;
         }
         s->current_chunk = chunk;
     }
@@ -641,6 +642,13 @@ static int dmg_read(BlockDriverState *bs, int64_t sector_num,
         if (dmg_read_chunk(bs, sector_num + i) != 0) {
             return -1;
         }
+        /* Special case: current chunk is all zeroes. Do not perform a memcpy as
+         * s->uncompressed_chunk may be too small to cover the large all-zeroes
+         * section. */
+        if (s->types[s->current_chunk] == 2) { /* all zeroes block entry */
+            memset(buf + i * 512, 0, 512);
+            continue;
+        }
         sector_offset_in_chunk = sector_num + i - s->sectors[s->current_chunk];
         memcpy(buf + i * 512,
                s->uncompressed_chunk + sector_offset_in_chunk * 512, 512);
-- 
2.2.1

^ permalink raw reply related	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 00/10] block/dmg: (compatibility) fixes and bzip2 support
  2014-12-27 15:01 [Qemu-devel] [PATCH 00/10] block/dmg: (compatibility) fixes and bzip2 support Peter Wu
                   ` (9 preceding siblings ...)
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 10/10] block/dmg: improve zeroes handling Peter Wu
@ 2015-01-02 14:14 ` Stefan Hajnoczi
  2015-01-02 16:31   ` John Snow
  10 siblings, 1 reply; 39+ messages in thread
From: Stefan Hajnoczi @ 2015-01-02 14:14 UTC (permalink / raw)
  To: Peter Wu; +Cc: Kevin Wolf, John Snow, qemu-devel, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 390 bytes --]

On Sat, Dec 27, 2014 at 04:01:34PM +0100, Peter Wu wrote:
> These series improve QEMU support for DMG image files:

Hi,
Thanks for this patch series.  Kevin and I consider patches for merging
after they have a Reviewed-by: from at least 1 other QEMU contributor.

I have CCed John Snow.

John: If you are busy, please CC someone else or let us know so this
series can get reviewed.

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 00/10] block/dmg: (compatibility) fixes and bzip2 support
  2015-01-02 14:14 ` [Qemu-devel] [PATCH 00/10] block/dmg: (compatibility) fixes and bzip2 support Stefan Hajnoczi
@ 2015-01-02 16:31   ` John Snow
  2015-01-02 18:46     ` Peter Wu
  0 siblings, 1 reply; 39+ messages in thread
From: John Snow @ 2015-01-02 16:31 UTC (permalink / raw)
  To: Stefan Hajnoczi, Peter Wu; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi



On 01/02/2015 09:14 AM, Stefan Hajnoczi wrote:
> On Sat, Dec 27, 2014 at 04:01:34PM +0100, Peter Wu wrote:
>> These series improve QEMU support for DMG image files:
>
> Hi,
> Thanks for this patch series.  Kevin and I consider patches for merging
> after they have a Reviewed-by: from at least 1 other QEMU contributor.
>
> I have CCed John Snow.
>
> John: If you are busy, please CC someone else or let us know so this
> series can get reviewed.
>
> Stefan
>

Just recomposing myself post-vacation, I will start looking this over today.

--John

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 00/10] block/dmg: (compatibility) fixes and bzip2 support
  2015-01-02 16:31   ` John Snow
@ 2015-01-02 18:46     ` Peter Wu
  2015-01-02 18:58       ` John Snow
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Wu @ 2015-01-02 18:46 UTC (permalink / raw)
  To: John Snow, Stefan Hajnoczi; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

FYI, I plan to make some more changes:

- do not require offset ≠ 0 for resource fork and XML offsets. Technically it is allowed, do you agree on this change?

- improve offset checking https://git.lekensteyn.nl/peter/qemu/commit/?h=block-dmg-2.3&id=41fd83773361923f668f54796ff563660b77e96c (squash with the existing length checking patch)

- (not part of this series, but for future consideration) read errors currently return 1 (EPERM). EIO or EINVAL would probably a better choice depending on the error type.

Other than that, the patches should be ready for review. Thank you in advance.

Kind regards,
Peter
https://lekensteyn.nl
(pardon my brevity, top-posting and formatting, sent from my phone)


On January 2, 2015 5:31:33 PM CET, John Snow <jsnow@redhat.com> wrote:
>
>
>On 01/02/2015 09:14 AM, Stefan Hajnoczi wrote:
>> On Sat, Dec 27, 2014 at 04:01:34PM +0100, Peter Wu wrote:
>>> These series improve QEMU support for DMG image files:
>>
>> Hi,
>> Thanks for this patch series.  Kevin and I consider patches for
>merging
>> after they have a Reviewed-by: from at least 1 other QEMU
>contributor.
>>
>> I have CCed John Snow.
>>
>> John: If you are busy, please CC someone else or let us know so this
>> series can get reviewed.
>>
>> Stefan
>>
>
>Just recomposing myself post-vacation, I will start looking this over
>today.
>
>--John

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 00/10] block/dmg: (compatibility) fixes and bzip2 support
  2015-01-02 18:46     ` Peter Wu
@ 2015-01-02 18:58       ` John Snow
  2015-01-02 21:49         ` Peter Wu
  0 siblings, 1 reply; 39+ messages in thread
From: John Snow @ 2015-01-02 18:58 UTC (permalink / raw)
  To: Peter Wu; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel



On 01/02/2015 01:46 PM, Peter Wu wrote:
> FYI, I plan to make some more changes:
>
> - do not require offset ≠ 0 for resource fork and XML offsets. Technically it is allowed, do you agree on this change?

If you have seen this in the wild, I definitely agree. If you haven't, I 
am not against the change, but there's likely no hurry to include it in 
this series if the changes are not simple.

> - improve offset checking https://git.lekensteyn.nl/peter/qemu/commit/?h=block-dmg-2.3&id=41fd83773361923f668f54796ff563660b77e96c (squash with the existing length checking patch)
>
> - (not part of this series, but for future consideration) read errors currently return 1 (EPERM). EIO or EINVAL would probably a better choice depending on the error type.
>
> Other than that, the patches should be ready for review. Thank you in advance.
>
> Kind regards,
> Peter
> https://lekensteyn.nl
> (pardon my brevity, top-posting and formatting, sent from my phone)
>
>
> On January 2, 2015 5:31:33 PM CET, John Snow <jsnow@redhat.com> wrote:
>>
>>
>> On 01/02/2015 09:14 AM, Stefan Hajnoczi wrote:
>>> On Sat, Dec 27, 2014 at 04:01:34PM +0100, Peter Wu wrote:
>>>> These series improve QEMU support for DMG image files:
>>>
>>> Hi,
>>> Thanks for this patch series.  Kevin and I consider patches for
>> merging
>>> after they have a Reviewed-by: from at least 1 other QEMU
>> contributor.
>>>
>>> I have CCed John Snow.
>>>
>>> John: If you are busy, please CC someone else or let us know so this
>>> series can get reviewed.
>>>
>>> Stefan
>>>
>>
>> Just recomposing myself post-vacation, I will start looking this over
>> today.
>>
>> --John
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 00/10] block/dmg: (compatibility) fixes and bzip2 support
  2015-01-02 18:58       ` John Snow
@ 2015-01-02 21:49         ` Peter Wu
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Wu @ 2015-01-02 21:49 UTC (permalink / raw)
  To: John Snow; +Cc: Kevin Wolf, Stefan Hajnoczi, qemu-devel

On Friday 02 January 2015 13:58:33 John Snow wrote:
> 
> On 01/02/2015 01:46 PM, Peter Wu wrote:
> > FYI, I plan to make some more changes:
> >
> > - do not require offset ≠ 0 for resource fork and XML offsets.
> > Technically it is allowed, do you agree on this change?
> 
> If you have seen this in the wild, I definitely agree. If you haven't, I 
> am not against the change, but there's likely no hurry to include it in 
> this series if the changes are not simple.

It would involve only a removal of "rsrc_fork_offset != 0 && " in patch
3 and "plist_xml_offset != 0 && " in patch 5. I have not seen it in the
real world, only when trying to construct a dmg file by hand for testing
purposes. The change is simple and can be squashed in the patch.

It makes sense since previously only the offset was checked. Now the
length is checked instead. Before:

    /* read offset */
    ret = read_uint64(bs, offset, &info_begin);
    if (ret < 0) {
        goto fail;
    } else if (info_begin == 0) {
        /* assume invalid file when offset is zero */
        ret = -EINVAL;
        goto fail;
    } 

After (in current patch series):

    /* offset of resource fork (RsrcForkOffset) */
    ret = read_uint64(bs, offset + 0x28, &rsrc_fork_offset);
    if (ret < 0) {
        goto fail;
    }
    ret = read_uint64(bs, offset + 0x30, &rsrc_fork_length);
    if (ret < 0) {
        goto fail;
    }
    // ...
    if (rsrc_fork_offset != 0 && rsrc_fork_length != 0) {
        ret = dmg_read_resource_fork(bs, &ds,
    

In the current patch series both the offset and lengths are checked, but
it is sufficient to look at just the length.

Kind regards,
Peter

> > - improve offset checking
> > https://git.lekensteyn.nl/peter/qemu/commit/?h=block-dmg-2.3&id=41fd83773361923f668f54796ff563660b77e96c
> > (squash with the existing length checking patch)
> >
> > - (not part of this series, but for future consideration) read
> > errors currently return 1 (EPERM). EIO or EINVAL would probably a
> > better choice depending on the error type.
> >
> > Other than that, the patches should be ready for review. Thank you
> > in advance.
> >
> > Kind regards,
> > Peter
> > https://lekensteyn.nl
> > (pardon my brevity, top-posting and formatting, sent from my phone)
> >
> >
> > On January 2, 2015 5:31:33 PM CET, John Snow <jsnow@redhat.com> wrote:
> >>
> >>
> >> On 01/02/2015 09:14 AM, Stefan Hajnoczi wrote:
> >>> On Sat, Dec 27, 2014 at 04:01:34PM +0100, Peter Wu wrote:
> >>>> These series improve QEMU support for DMG image files:
> >>>
> >>> Hi,
> >>> Thanks for this patch series.  Kevin and I consider patches for
> >> merging
> >>> after they have a Reviewed-by: from at least 1 other QEMU
> >> contributor.
> >>>
> >>> I have CCed John Snow.
> >>>
> >>> John: If you are busy, please CC someone else or let us know so this
> >>> series can get reviewed.
> >>>
> >>> Stefan
> >>>
> >>
> >> Just recomposing myself post-vacation, I will start looking this over
> >> today.
> >>
> >> --John

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 01/10] block/dmg: properly detect the UDIF trailer
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 01/10] block/dmg: properly detect the UDIF trailer Peter Wu
@ 2015-01-02 23:58   ` John Snow
  2015-01-03  9:39     ` Peter Wu
  2015-01-06 13:35   ` Stefan Hajnoczi
  1 sibling, 1 reply; 39+ messages in thread
From: John Snow @ 2015-01-02 23:58 UTC (permalink / raw)
  To: Peter Wu, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi



On 12/27/2014 10:01 AM, Peter Wu wrote:
> DMG files have a variable length with a UDIF trailer at the end of a
> file. This UDIF trailer is essential as it describes the contents of
> the image. At the moment however, the start of this trailer is almost
> always incorrect as bdrv_getlength() returns a multiple of the block
> size (rounded up). This results in a failure to recognize DMG files,
> resulting in Invalid argument (EINVAL) errors.
>
> As there is no API to retrieve the real file size, look for the magic
> header in the last two sectors to find the start of this 512-byte UDIF
> trailer (the "koly" block).
>
> The resource fork offset ("info_begin") has its offset adjusted as the
> initial value of offset does not mean "end of file" anymore, but "begin
> of UDIF trailer".
>
> Signed-off-by: Peter Wu <peter@lekensteyn.nl>
> ---
>   block/dmg.c | 40 ++++++++++++++++++++++++++++++++++++----
>   1 file changed, 36 insertions(+), 4 deletions(-)
>
> diff --git a/block/dmg.c b/block/dmg.c
> index e455886..df274f9 100644
> --- a/block/dmg.c
> +++ b/block/dmg.c
> @@ -131,6 +131,39 @@ static void update_max_chunk_size(BDRVDMGState *s, uint32_t chunk,
>       }
>   }
>
> +static int64_t dmg_find_koly_offset(BlockDriverState *file_bs)
> +{
> +    int64_t length;
> +    int64_t offset = 0;
> +    uint8_t buffer[515];
> +    int i, ret;
> +
> +    /* bdrv_getlength returns a multiple of block size (512), rounded up. Since
> +     * dmg images can have odd sizes, try to look for the "koly" magic which
> +     * marks the begin of the UDIF trailer (512 bytes). This magic can be found
> +     * in the last 511 bytes of the second-last sector or the first 4 bytes of
> +     * the last sector (search space: 515 bytes) */
> +    length = bdrv_getlength(file_bs);
> +    if (length < 512) {
> +        return length < 0 ? length : -EINVAL;
> +    }
> +    if (length > 511 + 512) {
> +        offset = length - 511 - 512;
> +    }
> +    length = length < 515 ? length : 515;
> +    ret = bdrv_pread(file_bs, offset, buffer, length);
> +    if (ret < 4) {
> +        return ret < 0 ? ret : -EINVAL;
> +    }
> +    for (i = 0; i < length - 3; i++) {
> +        if (buffer[i] == 'k' && buffer[i+1] == 'o' &&
> +            buffer[i+2] == 'l' && buffer[i+3] == 'y') {
> +            return offset + i;
> +        }
> +    }
> +    return -EINVAL;
> +}
> +
>   static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
>                       Error **errp)
>   {
> @@ -145,15 +178,14 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
>       s->n_chunks = 0;
>       s->offsets = s->lengths = s->sectors = s->sectorcounts = NULL;
>
> -    /* read offset of info blocks */
> -    offset = bdrv_getlength(bs->file);
> +    /* locate the UDIF trailer */
> +    offset = dmg_find_koly_offset(bs->file);
>       if (offset < 0) {
>           ret = offset;
>           goto fail;
>       }
> -    offset -= 0x1d8;
>
> -    ret = read_uint64(bs, offset, &info_begin);
> +    ret = read_uint64(bs, offset + 0x28, &info_begin);
>       if (ret < 0) {
>           goto fail;
>       } else if (info_begin == 0) {
>

If there really is no convenient way to retrieve the real length ...
(Stefan: Would that be difficult to add?)

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 02/10] block/dmg: extract mish block decoding functionality
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 02/10] block/dmg: extract mish block decoding functionality Peter Wu
@ 2015-01-02 23:59   ` John Snow
  2015-01-03 11:05     ` Peter Wu
  2015-01-06 13:42   ` Stefan Hajnoczi
  1 sibling, 1 reply; 39+ messages in thread
From: John Snow @ 2015-01-02 23:59 UTC (permalink / raw)
  To: Peter Wu, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi



On 12/27/2014 10:01 AM, Peter Wu wrote:
> Extract the mish block decoder such that this can be used for other
> formats in the future. A new DmgHeaderState struct is introduced to
> share state while decoding.
>
> The code is kept unchanged as much as possible, a "fail" label is added
> for example where a simple return would probably do.
>
> Signed-off-by: Peter Wu <peter@lekensteyn.nl>
> ---
>   block/dmg.c | 216 +++++++++++++++++++++++++++++++++++-------------------------
>   1 file changed, 125 insertions(+), 91 deletions(-)
>
> diff --git a/block/dmg.c b/block/dmg.c
> index df274f9..6dc6dbb 100644
> --- a/block/dmg.c
> +++ b/block/dmg.c
> @@ -164,19 +164,137 @@ static int64_t dmg_find_koly_offset(BlockDriverState *file_bs)
>       return -EINVAL;
>   }
>
> +/* used when building the sector table */
> +typedef struct DmgHeaderState {
> +    /* used internally by dmg_read_mish_block to remember offsets of blocks
> +     * across calls */
> +    uint64_t last_in_offset;
> +    uint64_t last_out_offset;
> +    /* exported for dmg_open */
> +    uint32_t max_compressed_size;
> +    uint32_t max_sectors_per_chunk;
> +} DmgHeaderState;
> +
> +static int dmg_read_mish_block(BlockDriverState *bs, DmgHeaderState *ds,
> +                               int64_t offset, uint32_t count)
> +{
> +    BDRVDMGState *s = bs->opaque;
> +    uint32_t type, i;
> +    int ret;
> +    size_t new_size;
> +    uint32_t chunk_count;
> +
> +    ret = read_uint32(bs, offset, &type);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    /* skip data that is not a valid MISH block (invalid magic or too small) */
> +    if (type != 0x6d697368 || count < 244) {
> +        /* assume success for now */
> +        return 0;
> +    }
> +
> +    offset += 4;
> +    offset += 200;
> +
> +    chunk_count = (count - 204) / 40;
> +    new_size = sizeof(uint64_t) * (s->n_chunks + chunk_count);
> +    s->types = g_realloc(s->types, new_size / 2);
> +    s->offsets = g_realloc(s->offsets, new_size);
> +    s->lengths = g_realloc(s->lengths, new_size);
> +    s->sectors = g_realloc(s->sectors, new_size);
> +    s->sectorcounts = g_realloc(s->sectorcounts, new_size);
> +
> +    for (i = s->n_chunks; i < s->n_chunks + chunk_count; i++) {
> +        ret = read_uint32(bs, offset, &s->types[i]);
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +        offset += 4;
> +        if (s->types[i] != 0x80000005 && s->types[i] != 1 &&
> +            s->types[i] != 2) {
> +            if (s->types[i] == 0xffffffff && i > 0) {
> +                ds->last_in_offset = s->offsets[i - 1] + s->lengths[i - 1];
> +                ds->last_out_offset = s->sectors[i - 1] +
> +                                      s->sectorcounts[i - 1];
> +            }
> +            chunk_count--;
> +            i--;
> +            offset += 36;
> +            continue;
> +        }
> +        offset += 4;
> +
> +        ret = read_uint64(bs, offset, &s->sectors[i]);
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +        s->sectors[i] += ds->last_out_offset;
> +        offset += 8;
> +
> +        ret = read_uint64(bs, offset, &s->sectorcounts[i]);
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +        offset += 8;
> +
> +        if (s->sectorcounts[i] > DMG_SECTORCOUNTS_MAX) {
> +            error_report("sector count %" PRIu64 " for chunk %" PRIu32
> +                         " is larger than max (%u)",
> +                         s->sectorcounts[i], i, DMG_SECTORCOUNTS_MAX);
> +            ret = -EINVAL;
> +            goto fail;
> +        }
> +
> +        ret = read_uint64(bs, offset, &s->offsets[i]);
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +        s->offsets[i] += ds->last_in_offset;
> +        offset += 8;
> +
> +        ret = read_uint64(bs, offset, &s->lengths[i]);
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +        offset += 8;
> +
> +        if (s->lengths[i] > DMG_LENGTHS_MAX) {
> +            error_report("length %" PRIu64 " for chunk %" PRIu32
> +                         " is larger than max (%u)",
> +                         s->lengths[i], i, DMG_LENGTHS_MAX);
> +            ret = -EINVAL;
> +            goto fail;
> +        }
> +
> +        update_max_chunk_size(s, i, &ds->max_compressed_size,
> +                              &ds->max_sectors_per_chunk);
> +    }
> +    s->n_chunks += chunk_count;
> +    return 0;
> +
> +fail:
> +    return ret;
> +}
> +
>   static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
>                       Error **errp)
>   {
>       BDRVDMGState *s = bs->opaque;
> -    uint64_t info_begin, info_end, last_in_offset, last_out_offset;
> +    DmgHeaderState ds;
> +    uint64_t info_begin, info_end;
>       uint32_t count, tmp;
> -    uint32_t max_compressed_size = 1, max_sectors_per_chunk = 1, i;
>       int64_t offset;
>       int ret;
>
>       bs->read_only = 1;
>       s->n_chunks = 0;
>       s->offsets = s->lengths = s->sectors = s->sectorcounts = NULL;
> +    ds.last_in_offset = 0;
> +    ds.last_out_offset = 0;
> +    ds.max_compressed_size = 1;
> +    ds.max_sectors_per_chunk = 1;

It might be nice to have a tiny initializer macro/function so that 
future users of functions that use this structure don't need to know the 
magic numbers.

>       /* locate the UDIF trailer */
>       offset = dmg_find_koly_offset(bs->file);
> @@ -210,13 +328,11 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
>       }
>       info_end = info_begin + count;
>
> +    /* begin of mish block */
>       offset = info_begin + 0x100;
>
>       /* read offsets */
> -    last_in_offset = last_out_offset = 0;
>       while (offset < info_end) {
> -        uint32_t type;
> -
>           ret = read_uint32(bs, offset, &count);
>           if (ret < 0) {
>               goto fail;
> @@ -226,100 +342,18 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
>           }
>           offset += 4;
>
> -        ret = read_uint32(bs, offset, &type);
> +        ret = dmg_read_mish_block(bs, &ds, offset, count);
>           if (ret < 0) {
>               goto fail;
>           }
> -
> -        if (type == 0x6d697368 && count >= 244) {
> -            size_t new_size;
> -            uint32_t chunk_count;
> -
> -            offset += 4;
> -            offset += 200;
> -
> -            chunk_count = (count - 204) / 40;
> -            new_size = sizeof(uint64_t) * (s->n_chunks + chunk_count);
> -            s->types = g_realloc(s->types, new_size / 2);
> -            s->offsets = g_realloc(s->offsets, new_size);
> -            s->lengths = g_realloc(s->lengths, new_size);
> -            s->sectors = g_realloc(s->sectors, new_size);
> -            s->sectorcounts = g_realloc(s->sectorcounts, new_size);
> -
> -            for (i = s->n_chunks; i < s->n_chunks + chunk_count; i++) {
> -                ret = read_uint32(bs, offset, &s->types[i]);
> -                if (ret < 0) {
> -                    goto fail;
> -                }
> -                offset += 4;
> -                if (s->types[i] != 0x80000005 && s->types[i] != 1 &&
> -                    s->types[i] != 2) {
> -                    if (s->types[i] == 0xffffffff && i > 0) {
> -                        last_in_offset = s->offsets[i - 1] + s->lengths[i - 1];
> -                        last_out_offset = s->sectors[i - 1] +
> -                                          s->sectorcounts[i - 1];
> -                    }
> -                    chunk_count--;
> -                    i--;
> -                    offset += 36;
> -                    continue;
> -                }
> -                offset += 4;
> -
> -                ret = read_uint64(bs, offset, &s->sectors[i]);
> -                if (ret < 0) {
> -                    goto fail;
> -                }
> -                s->sectors[i] += last_out_offset;
> -                offset += 8;
> -
> -                ret = read_uint64(bs, offset, &s->sectorcounts[i]);
> -                if (ret < 0) {
> -                    goto fail;
> -                }
> -                offset += 8;
> -
> -                if (s->sectorcounts[i] > DMG_SECTORCOUNTS_MAX) {
> -                    error_report("sector count %" PRIu64 " for chunk %" PRIu32
> -                                 " is larger than max (%u)",
> -                                 s->sectorcounts[i], i, DMG_SECTORCOUNTS_MAX);
> -                    ret = -EINVAL;
> -                    goto fail;
> -                }
> -
> -                ret = read_uint64(bs, offset, &s->offsets[i]);
> -                if (ret < 0) {
> -                    goto fail;
> -                }
> -                s->offsets[i] += last_in_offset;
> -                offset += 8;
> -
> -                ret = read_uint64(bs, offset, &s->lengths[i]);
> -                if (ret < 0) {
> -                    goto fail;
> -                }
> -                offset += 8;
> -
> -                if (s->lengths[i] > DMG_LENGTHS_MAX) {
> -                    error_report("length %" PRIu64 " for chunk %" PRIu32
> -                                 " is larger than max (%u)",
> -                                 s->lengths[i], i, DMG_LENGTHS_MAX);
> -                    ret = -EINVAL;
> -                    goto fail;
> -                }
> -
> -                update_max_chunk_size(s, i, &max_compressed_size,
> -                                      &max_sectors_per_chunk);
> -            }
> -            s->n_chunks += chunk_count;
> -        }
> +        offset += count;

A little justification here might be nice: I assume the "count" always 
includes the full byte count of the mish block, and the old code just 
incremented the offset up to this amount. This is the only functional 
difference in this refactor that I can see.

>       }
>
>       /* initialize zlib engine */
>       s->compressed_chunk = qemu_try_blockalign(bs->file,
> -                                              max_compressed_size + 1);
> +                                              ds.max_compressed_size + 1);
>       s->uncompressed_chunk = qemu_try_blockalign(bs->file,
> -                                                512 * max_sectors_per_chunk);
> +                                                512 * ds.max_sectors_per_chunk);
>       if (s->compressed_chunk == NULL || s->uncompressed_chunk == NULL) {
>           ret = -ENOMEM;
>           goto fail;
>

Thank you for splitting this monolithic function up into nicer pieces. 
With or without addressing my comments above:

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 03/10] block/dmg: extract processing of resource forks
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 03/10] block/dmg: extract processing of resource forks Peter Wu
@ 2015-01-03  0:01   ` John Snow
  2015-01-03 11:24     ` Peter Wu
  0 siblings, 1 reply; 39+ messages in thread
From: John Snow @ 2015-01-03  0:01 UTC (permalink / raw)
  To: Peter Wu, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi



On 12/27/2014 10:01 AM, Peter Wu wrote:
> Besides the offset, also read the resource length. This length is now
> used in the extracted function to verify the end of the resource fork
> against "count" from the resource fork.
>
> Signed-off-by: Peter Wu <peter@lekensteyn.nl>
> ---
>   block/dmg.c | 90 ++++++++++++++++++++++++++++++++++++++++---------------------
>   1 file changed, 59 insertions(+), 31 deletions(-)
>
> diff --git a/block/dmg.c b/block/dmg.c
> index 6dc6dbb..7f49388 100644
> --- a/block/dmg.c
> +++ b/block/dmg.c
> @@ -278,38 +278,13 @@ fail:
>       return ret;
>   }
>
> -static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
> -                    Error **errp)
> +static int dmg_read_resource_fork(BlockDriverState *bs, DmgHeaderState *ds,
> +                                  uint64_t info_begin, uint64_t info_length)
>   {
> -    BDRVDMGState *s = bs->opaque;
> -    DmgHeaderState ds;
> -    uint64_t info_begin, info_end;
> -    uint32_t count, tmp;
> -    int64_t offset;
>       int ret;
> -
> -    bs->read_only = 1;
> -    s->n_chunks = 0;
> -    s->offsets = s->lengths = s->sectors = s->sectorcounts = NULL;
> -    ds.last_in_offset = 0;
> -    ds.last_out_offset = 0;
> -    ds.max_compressed_size = 1;
> -    ds.max_sectors_per_chunk = 1;
> -
> -    /* locate the UDIF trailer */
> -    offset = dmg_find_koly_offset(bs->file);
> -    if (offset < 0) {
> -        ret = offset;
> -        goto fail;
> -    }
> -
> -    ret = read_uint64(bs, offset + 0x28, &info_begin);
> -    if (ret < 0) {
> -        goto fail;
> -    } else if (info_begin == 0) {
> -        ret = -EINVAL;
> -        goto fail;
> -    }
> +    uint32_t count, tmp;
> +    uint64_t info_end;
> +    uint64_t offset;
>
>       ret = read_uint32(bs, info_begin, &tmp);
>       if (ret < 0) {
> @@ -326,6 +301,10 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
>           ret = -EINVAL;
>           goto fail;
>       }
> +    if (count > info_length) {
> +        ret = -EINVAL;
> +        goto fail;
> +    }
>       info_end = info_begin + count;
>
>       /* begin of mish block */
> @@ -342,12 +321,61 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
>           }
>           offset += 4;
>
> -        ret = dmg_read_mish_block(bs, &ds, offset, count);
> +        ret = dmg_read_mish_block(bs, ds, offset, count);
>           if (ret < 0) {
>               goto fail;
>           }
>           offset += count;
>       }
> +    return 0;
> +
> +fail:
> +    return ret;
> +}
> +
> +static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
> +                    Error **errp)
> +{
> +    BDRVDMGState *s = bs->opaque;
> +    DmgHeaderState ds;
> +    uint64_t rsrc_fork_offset, rsrc_fork_length;
> +    int64_t offset;
> +    int ret;
> +
> +    bs->read_only = 1;
> +    s->n_chunks = 0;
> +    s->offsets = s->lengths = s->sectors = s->sectorcounts = NULL;
> +    ds.last_in_offset = 0;
> +    ds.last_out_offset = 0;
> +    ds.max_compressed_size = 1;
> +    ds.max_sectors_per_chunk = 1;
> +
> +    /* locate the UDIF trailer */
> +    offset = dmg_find_koly_offset(bs->file);
> +    if (offset < 0) {
> +        ret = offset;
> +        goto fail;
> +    }
> +
> +    /* offset of resource fork (RsrcForkOffset) */
> +    ret = read_uint64(bs, offset + 0x28, &rsrc_fork_offset);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +    ret = read_uint64(bs, offset + 0x30, &rsrc_fork_length);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +    if (rsrc_fork_offset != 0 && rsrc_fork_length != 0) {
> +        ret = dmg_read_resource_fork(bs, &ds,
> +                                     rsrc_fork_offset, rsrc_fork_length);
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +    } else {
> +        ret = -EINVAL;
> +        goto fail;
> +    }
>
>       /* initialize zlib engine */
>       s->compressed_chunk = qemu_try_blockalign(bs->file,
>

I know your refactor hardly touches the code here, but this is a good 
place to mention it:

 From what I can tell from the Resource Fork format
https://developer.apple.com/legacy/library/documentation/mac/pdf/MoreMacintoshToolbox.pdf#page=151

there are four fields of interest in the header of the resourcefork 
before we start looping trying to find mish data:

uint32_t data_offset;
uint32_t map_offset;
uint32_t data_length;
uint32_t map_length;

We are treating the map which comes right after the data as the "data 
length" which is not too far from the truth, but may include extra bytes.

e.g., in the vlc dmg you linked:

The koly header advertises a Resource Fork length of 93276 bytes.
the resource fork itself has a header of:

data_offset := 256
map_offset := 92860	// not length!
data_length := 92604
map_length := 416


Since we are naming this function explicitly after the Resource Fork, we 
should probably tidy this up just a little bit to be clearer at the very 
least.

The space reserved for system use, here asserted by QEMU to always be 
0x100, is technically variable and perhaps could be adjusted as well 
with minimal changes.

A comment pointing to this documentation for the format here would also 
be helpful, as your other source details koly and mish, but Resource 
Fork was missing.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 04/10] block/dmg: process a buffer instead of reading ints
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 04/10] block/dmg: process a buffer instead of reading ints Peter Wu
@ 2015-01-03  0:01   ` John Snow
  0 siblings, 0 replies; 39+ messages in thread
From: John Snow @ 2015-01-03  0:01 UTC (permalink / raw)
  To: Peter Wu, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi



On 12/27/2014 10:01 AM, Peter Wu wrote:
> As the decoded plist XML is not a pointer in the file,
> dmg_read_mish_block must be able to process a buffer instead of a file
> pointer. Since the full buffer must be processed, let's change the
> return value again to just a success flag.
>
> Signed-off-by: Peter Wu <peter@lekensteyn.nl>
> ---
>   block/dmg.c | 60 ++++++++++++++++++++++++++++++------------------------------
>   1 file changed, 30 insertions(+), 30 deletions(-)
>
> diff --git a/block/dmg.c b/block/dmg.c
> index 7f49388..75e771a 100644
> --- a/block/dmg.c
> +++ b/block/dmg.c

[snip]

I see; because your plist XML implementation has to decode base64 data, 
this function needs to be able to operate on a buffer instead of taking 
an offset into the file, because we may need to do some processing on 
the mish data first.

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 05/10] block/dmg: validate chunk size to avoid overflow
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 05/10] block/dmg: validate chunk size to avoid overflow Peter Wu
@ 2015-01-03  0:02   ` John Snow
  0 siblings, 0 replies; 39+ messages in thread
From: John Snow @ 2015-01-03  0:02 UTC (permalink / raw)
  To: Peter Wu, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi



On 12/27/2014 10:01 AM, Peter Wu wrote:
> Previously the chunk size was not checked, allowing for a large memory
> allocation. This patch checks whether the chunks size is within the
> resource fork length.
>
> Signed-off-by: Peter Wu <peter@lekensteyn.nl>
> ---
>   block/dmg.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/block/dmg.c b/block/dmg.c
> index 75e771a..19e4fe2 100644
> --- a/block/dmg.c
> +++ b/block/dmg.c
> @@ -308,7 +308,7 @@ static int dmg_read_resource_fork(BlockDriverState *bs, DmgHeaderState *ds,
>           ret = read_uint32(bs, offset, &count);
>           if (ret < 0) {
>               goto fail;
> -        } else if (count == 0) {
> +        } else if (count == 0 || count > info_end - offset) {
>               ret = -EINVAL;
>               goto fail;
>           }
>

As mentioned already, please squash your latest changes in your git 
repository into this patch for v2.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 06/10] block/dmg: process XML plists
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 06/10] block/dmg: process XML plists Peter Wu
@ 2015-01-03  0:04   ` John Snow
  2015-01-03 11:54     ` Peter Wu
  2015-01-05 16:54   ` John Snow
  1 sibling, 1 reply; 39+ messages in thread
From: John Snow @ 2015-01-03  0:04 UTC (permalink / raw)
  To: Peter Wu, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi



On 12/27/2014 10:01 AM, Peter Wu wrote:
> The format is simple enough to avoid using a full-blown XML parser.
> The offsets are based on the description at
> http://newosxbook.com/DMG.html
>
> Signed-off-by: Peter Wu <peter@lekensteyn.nl>
> ---
>   block/dmg.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 69 insertions(+)
>
> diff --git a/block/dmg.c b/block/dmg.c
> index 19e4fe2..c03ea01 100644
> --- a/block/dmg.c
> +++ b/block/dmg.c
> @@ -26,6 +26,7 @@
>   #include "qemu/bswap.h"
>   #include "qemu/module.h"
>   #include <zlib.h>
> +#include <glib.h>
>
>   enum {
>       /* Limit chunk sizes to prevent unreasonable amounts of memory being used
> @@ -333,12 +334,66 @@ fail:
>       return ret;
>   }
>
> +static int dmg_read_plist_xml(BlockDriverState *bs, DmgHeaderState *ds,
> +                              uint64_t info_begin, uint64_t info_length)
> +{
> +    BDRVDMGState *s = bs->opaque;
> +    int ret;
> +    uint8_t *buffer = NULL;
> +    char *data_begin, *data_end;
> +
> +    /* Have at least some length to avoid NULL for g_malloc. Attempt to set a
> +     * safe upper cap on the data length. A test sample had a XML length of
> +     * about 1 MiB. */
> +    if (info_length == 0 || info_length > 16 * 1024 * 1024) {
> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +
> +    buffer = g_malloc(info_length + 1);
> +    buffer[info_length] = '\0';
> +    ret = bdrv_pread(bs->file, info_begin, buffer, info_length);
> +    if (ret != info_length) {
> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +
> +    /* look for <data>...</data>. The data is 284 (0x11c) bytes after base64
> +     * decode. The actual data element has 431 (0x1af) bytes which includes tabs
> +     * and line feeds. */
> +    data_end = (char *)buffer;
> +    while ((data_begin = strstr(data_end, "<data>")) != NULL) {
> +        gsize out_len = 0;
> +
> +        data_begin += 6;
> +        data_end = strstr(data_begin, "</data>");
> +        /* malformed XML? */
> +        if (data_end == NULL) {
> +            ret = -EINVAL;
> +            goto fail;
> +        }
> +        *data_end++ = '\0';
> +        g_base64_decode_inplace(data_begin, &out_len);
> +        ret = dmg_read_mish_block(s, ds, (uint8_t *)data_begin,
> +                                  (uint32_t)out_len);
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +    }
> +    ret = 0;
> +
> +fail:
> +    g_free(buffer);
> +    return ret;
> +}
> +

This starts to make me a little nervous, because we're ignoring so much 
of the XML document structure here and just effectively performing a 
regular search for "<data>(.*)</data>".

Can we guarantee that the ONLY time the data element is used in this 
document is when it is being used in the exact context we are expecting 
here, where it contains the b64 mish data we expect it to?

i.e. it is always in a path like this as detailed by 
http://newosxbook.com/DMG.html :

plist/dict/key[text()='resource-fork']/following-sibling::dict/key[text()='blkx']/following-sibling::array/dict/key[text()='data']/following-sibling::data

I notice that this document says other sections MAY be present, do any 
of them ever need to be parsed? Has anyone written about them before?

Do we know if any use data sections?

I suppose at the very least, sections of interest are always going to 
include the "mish" magic, so that should probably keep us from doing 
anything too stupid ...

>   static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
>                       Error **errp)
>   {
>       BDRVDMGState *s = bs->opaque;
>       DmgHeaderState ds;
>       uint64_t rsrc_fork_offset, rsrc_fork_length;
> +    uint64_t plist_xml_offset, plist_xml_length;
>       int64_t offset;
>       int ret;
>
> @@ -366,12 +421,26 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
>       if (ret < 0) {
>           goto fail;
>       }
> +    /* offset of property list (XMLOffset) */
> +    ret = read_uint64(bs, offset + 0xd8, &plist_xml_offset);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +    ret = read_uint64(bs, offset + 0xe0, &plist_xml_length);
> +    if (ret < 0) {
> +        goto fail;
> +    }
>       if (rsrc_fork_offset != 0 && rsrc_fork_length != 0) {
>           ret = dmg_read_resource_fork(bs, &ds,
>                                        rsrc_fork_offset, rsrc_fork_length);
>           if (ret < 0) {
>               goto fail;
>           }
> +    } else if (plist_xml_offset != 0 && plist_xml_length != 0) {
> +        ret = dmg_read_plist_xml(bs, &ds, plist_xml_offset, plist_xml_length);
> +        if (ret < 0) {
> +            goto fail;
> +        }
>       } else {
>           ret = -EINVAL;
>           goto fail;
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 07/10] block/dmg: set virtual size to a non-zero value
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 07/10] block/dmg: set virtual size to a non-zero value Peter Wu
@ 2015-01-03  0:04   ` John Snow
  0 siblings, 0 replies; 39+ messages in thread
From: John Snow @ 2015-01-03  0:04 UTC (permalink / raw)
  To: Peter Wu, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi



On 12/27/2014 10:01 AM, Peter Wu wrote:
> Right now the virtual size is always reported as zero which makes it
> impossible to convert between formats.
>
> After this patch, the number of sectors will be read from the BLXX
> ("mish") header.
>

Do you mean to say from the 'koly' header?

http://en.wikipedia.org/wiki/Apple_Disk_Image#UDIF_data_format


> To verify the behavior, the output of `dmg2img foo.dmg foo.img` was
> compared against `qemu-img convert -f dmg -O raw foo.dmg foo.raw`. The
> tests showed that the file contents are exactly the same, except that
> QEMU creates a slightly larger file (it matches the total sectors
> count).
>
> Signed-off-by: Peter Wu <peter@lekensteyn.nl>
> ---
>   block/dmg.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
>
> diff --git a/block/dmg.c b/block/dmg.c
> index c03ea01..984997f 100644
> --- a/block/dmg.c
> +++ b/block/dmg.c
> @@ -430,6 +430,14 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
>       if (ret < 0) {
>           goto fail;
>       }
> +    ret = read_uint64(bs, offset + 0x1ec, (uint64_t *)&bs->total_sectors);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +    if (bs->total_sectors < 0) {
> +        ret = -EINVAL;
> +        goto fail;
> +    }
>       if (rsrc_fork_offset != 0 && rsrc_fork_length != 0) {
>           ret = dmg_read_resource_fork(bs, &ds,
>                                        rsrc_fork_offset, rsrc_fork_length);
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 08/10] block/dmg: fix sector data offset calculation
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 08/10] block/dmg: fix sector data offset calculation Peter Wu
@ 2015-01-03  0:05   ` John Snow
  2015-01-03 12:47     ` Peter Wu
  0 siblings, 1 reply; 39+ messages in thread
From: John Snow @ 2015-01-03  0:05 UTC (permalink / raw)
  To: Peter Wu, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi



On 12/27/2014 10:01 AM, Peter Wu wrote:
> This patch addresses two issues:
>
>   - The data fork offset was not taken into account, resulting in failure
>     to read an InstallESD.dmg file (5164763151 bytes) which had a
>     non-zero DataForkOffset field.
>   - The offset of the previous block ("partition") was unconditionally
>     added to the current block because older files would start the input
>     offset of a new block at zero. Newer files (including vlc-2.1.5.dmg,
>     tuxpaint-0.9.15-macosx.dmg and OS X Yosemite [MAS].dmg) failed in
>     reads because these files have a continuous input offset.
>

What does "continuous input offset" mean? This change is not as clear to 
me, see below.

> Signed-off-by: Peter Wu <peter@lekensteyn.nl>
> ---
>   block/dmg.c | 16 +++++++++++++++-
>   1 file changed, 15 insertions(+), 1 deletion(-)
>
> diff --git a/block/dmg.c b/block/dmg.c
> index 984997f..93b597f 100644
> --- a/block/dmg.c
> +++ b/block/dmg.c
> @@ -179,6 +179,7 @@ static int64_t dmg_find_koly_offset(BlockDriverState *file_bs)
>   typedef struct DmgHeaderState {
>       /* used internally by dmg_read_mish_block to remember offsets of blocks
>        * across calls */
> +    uint64_t data_fork_offset;
>       uint64_t last_in_offset;
>       uint64_t last_out_offset;
>       /* exported for dmg_open */
> @@ -194,6 +195,7 @@ static int dmg_read_mish_block(BDRVDMGState *s, DmgHeaderState *ds,
>       size_t new_size;
>       uint32_t chunk_count;
>       int64_t offset = 0;
> +    uint64_t in_offset = ds->data_fork_offset;
>
>       type = buff_read_uint32(buffer, offset);
>       /* skip data that is not a valid MISH block (invalid magic or too small) */
> @@ -246,7 +248,12 @@ static int dmg_read_mish_block(BDRVDMGState *s, DmgHeaderState *ds,
>           }
>
>           s->offsets[i] = buff_read_uint64(buffer, offset);
> -        s->offsets[i] += ds->last_in_offset;
> +        /* If this offset is below the previous chunk end, then assume that all
> +         * following offsets are after the previous chunks. */
> +        if (s->offsets[i] + in_offset < ds->last_in_offset) {
> +            in_offset = ds->last_in_offset;
> +        }
> +        s->offsets[i] += in_offset;

I take it that all of the offsets referenced in the mish structures are 
relative to the start of the data fork block, which is why we are taking 
a value from the koly block and applying it to mish block values.

correct?

>           offset += 8;
>
>           s->lengths[i] = buff_read_uint64(buffer, offset);
> @@ -400,6 +407,7 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
>       bs->read_only = 1;
>       s->n_chunks = 0;
>       s->offsets = s->lengths = s->sectors = s->sectorcounts = NULL;
> +    ds.data_fork_offset = 0;
>       ds.last_in_offset = 0;
>       ds.last_out_offset = 0;
>       ds.max_compressed_size = 1;
> @@ -412,6 +420,12 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
>           goto fail;
>       }
>
> +    /* offset of data fork (DataForkOffset) */
> +    ret = read_uint64(bs, offset + 0x18, &ds.data_fork_offset);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
>       /* offset of resource fork (RsrcForkOffset) */
>       ret = read_uint64(bs, offset + 0x28, &rsrc_fork_offset);
>       if (ret < 0) {
>

A general question here:

Are we ever reading the preamble of the mish block? I see we are reading 
the 'n' items of 40-byte chunk data, but is there a reason we skip the 
first 200 bytes of mish data, or have I misread the documents on DMG 
that exist?

It looks like there are some good fields here: SectorNumber, 
SectorCount, DataOffset, and BlockDescriptors -- can these not be used 
to provide a more explicit error-checking of offsets, allowing us to 
make less assumptions about where these blocks begin and end?

Is there some reason they are unreliable?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 01/10] block/dmg: properly detect the UDIF trailer
  2015-01-02 23:58   ` John Snow
@ 2015-01-03  9:39     ` Peter Wu
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Wu @ 2015-01-03  9:39 UTC (permalink / raw)
  To: John Snow; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

On Friday 02 January 2015 18:58:00 John Snow wrote:
> 
> On 12/27/2014 10:01 AM, Peter Wu wrote:
> > DMG files have a variable length with a UDIF trailer at the end of a
> > file. This UDIF trailer is essential as it describes the contents of
> > the image. At the moment however, the start of this trailer is almost
> > always incorrect as bdrv_getlength() returns a multiple of the block
> > size (rounded up). This results in a failure to recognize DMG files,
> > resulting in Invalid argument (EINVAL) errors.
> >
> > As there is no API to retrieve the real file size, look for the magic
> > header in the last two sectors to find the start of this 512-byte UDIF
> > trailer (the "koly" block).
> >
> > The resource fork offset ("info_begin") has its offset adjusted as the
> > initial value of offset does not mean "end of file" anymore, but "begin
> > of UDIF trailer".
> >
> > Signed-off-by: Peter Wu <peter@lekensteyn.nl>
> > ---
> >   block/dmg.c | 40 ++++++++++++++++++++++++++++++++++++----
> >   1 file changed, 36 insertions(+), 4 deletions(-)
> >
> > diff --git a/block/dmg.c b/block/dmg.c
> > index e455886..df274f9 100644
> > --- a/block/dmg.c
> > +++ b/block/dmg.c
> > @@ -131,6 +131,39 @@ static void update_max_chunk_size(BDRVDMGState *s, uint32_t chunk,
> >       }
> >   }
> >
> > +static int64_t dmg_find_koly_offset(BlockDriverState *file_bs)
> > +{
> > +    int64_t length;
> > +    int64_t offset = 0;
> > +    uint8_t buffer[515];
> > +    int i, ret;
> > +
> > +    /* bdrv_getlength returns a multiple of block size (512), rounded up. Since
> > +     * dmg images can have odd sizes, try to look for the "koly" magic which
> > +     * marks the begin of the UDIF trailer (512 bytes). This magic can be found
> > +     * in the last 511 bytes of the second-last sector or the first 4 bytes of
> > +     * the last sector (search space: 515 bytes) */
> > +    length = bdrv_getlength(file_bs);
> > +    if (length < 512) {
> > +        return length < 0 ? length : -EINVAL;
> > +    }
> > +    if (length > 511 + 512) {
> > +        offset = length - 511 - 512;
> > +    }
> > +    length = length < 515 ? length : 515;
> > +    ret = bdrv_pread(file_bs, offset, buffer, length);
> > +    if (ret < 4) {
> > +        return ret < 0 ? ret : -EINVAL;
> > +    }
> > +    for (i = 0; i < length - 3; i++) {
> > +        if (buffer[i] == 'k' && buffer[i+1] == 'o' &&
> > +            buffer[i+2] == 'l' && buffer[i+3] == 'y') {
> > +            return offset + i;
> > +        }
> > +    }
> > +    return -EINVAL;
> > +}
> > +
> >   static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
> >                       Error **errp)
> >   {
> > @@ -145,15 +178,14 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
> >       s->n_chunks = 0;
> >       s->offsets = s->lengths = s->sectors = s->sectorcounts = NULL;
> >
> > -    /* read offset of info blocks */
> > -    offset = bdrv_getlength(bs->file);
> > +    /* locate the UDIF trailer */
> > +    offset = dmg_find_koly_offset(bs->file);
> >       if (offset < 0) {
> >           ret = offset;
> >           goto fail;
> >       }
> > -    offset -= 0x1d8;
> >
> > -    ret = read_uint64(bs, offset, &info_begin);
> > +    ret = read_uint64(bs, offset + 0x28, &info_begin);
> >       if (ret < 0) {
> >           goto fail;
> >       } else if (info_begin == 0) {
> >
> 
> If there really is no convenient way to retrieve the real length ...
> (Stefan: Would that be difficult to add?)
> 
> Reviewed-by: John Snow <jsnow@redhat.com>

The real length can be stored, but it takes more effort to do so. See
the stalled work on this bdrv-getlength-conversion branch[1] and in
particular "block: do not directly set total_sectors"[2].
-- 
Kind regards,
Peter
https://lekensteyn.nl

 [1]: https://git.lekensteyn.nl/peter/qemu/log/?h=bdrv-getlength-conversion
 [2]: https://git.lekensteyn.nl/peter/qemu/commit/?h=bdrv-getlength-conversion&id=e5164735e5b674a10134894589a060a0f5f32ccc

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 02/10] block/dmg: extract mish block decoding functionality
  2015-01-02 23:59   ` John Snow
@ 2015-01-03 11:05     ` Peter Wu
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Wu @ 2015-01-03 11:05 UTC (permalink / raw)
  To: John Snow; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

On Friday 02 January 2015 18:59:38 John Snow wrote:
> 
> On 12/27/2014 10:01 AM, Peter Wu wrote:
> > Extract the mish block decoder such that this can be used for other
> > formats in the future. A new DmgHeaderState struct is introduced to
> > share state while decoding.
> >
> > The code is kept unchanged as much as possible, a "fail" label is added
> > for example where a simple return would probably do.
> >
> > Signed-off-by: Peter Wu <peter@lekensteyn.nl>
> > ---
> >   block/dmg.c | 216 +++++++++++++++++++++++++++++++++++-------------------------
> >   1 file changed, 125 insertions(+), 91 deletions(-)
> >
> > diff --git a/block/dmg.c b/block/dmg.c
> > index df274f9..6dc6dbb 100644
> > --- a/block/dmg.c
> > +++ b/block/dmg.c
> > @@ -164,19 +164,137 @@ static int64_t dmg_find_koly_offset(BlockDriverState *file_bs)
> >       return -EINVAL;
> >   }
> >
> > +/* used when building the sector table */
> > +typedef struct DmgHeaderState {
> > +    /* used internally by dmg_read_mish_block to remember offsets of blocks
> > +     * across calls */
> > +    uint64_t last_in_offset;
> > +    uint64_t last_out_offset;
> > +    /* exported for dmg_open */
> > +    uint32_t max_compressed_size;
> > +    uint32_t max_sectors_per_chunk;
> > +} DmgHeaderState;
> > +
> > +static int dmg_read_mish_block(BlockDriverState *bs, DmgHeaderState *ds,
> > +                               int64_t offset, uint32_t count)
> > +{
> > +    BDRVDMGState *s = bs->opaque;
> > +    uint32_t type, i;
> > +    int ret;
> > +    size_t new_size;
> > +    uint32_t chunk_count;
> > +
> > +    ret = read_uint32(bs, offset, &type);
> > +    if (ret < 0) {
> > +        goto fail;
> > +    }
> > +
> > +    /* skip data that is not a valid MISH block (invalid magic or too small) */
> > +    if (type != 0x6d697368 || count < 244) {
> > +        /* assume success for now */
> > +        return 0;
> > +    }
> > +
> > +    offset += 4;
> > +    offset += 200;
> > +
> > +    chunk_count = (count - 204) / 40;
> > +    new_size = sizeof(uint64_t) * (s->n_chunks + chunk_count);
> > +    s->types = g_realloc(s->types, new_size / 2);
> > +    s->offsets = g_realloc(s->offsets, new_size);
> > +    s->lengths = g_realloc(s->lengths, new_size);
> > +    s->sectors = g_realloc(s->sectors, new_size);
> > +    s->sectorcounts = g_realloc(s->sectorcounts, new_size);
> > +
> > +    for (i = s->n_chunks; i < s->n_chunks + chunk_count; i++) {
> > +        ret = read_uint32(bs, offset, &s->types[i]);
> > +        if (ret < 0) {
> > +            goto fail;
> > +        }
> > +        offset += 4;
> > +        if (s->types[i] != 0x80000005 && s->types[i] != 1 &&
> > +            s->types[i] != 2) {
> > +            if (s->types[i] == 0xffffffff && i > 0) {
> > +                ds->last_in_offset = s->offsets[i - 1] + s->lengths[i - 1];
> > +                ds->last_out_offset = s->sectors[i - 1] +
> > +                                      s->sectorcounts[i - 1];
> > +            }
> > +            chunk_count--;
> > +            i--;
> > +            offset += 36;
> > +            continue;
> > +        }
> > +        offset += 4;
> > +
> > +        ret = read_uint64(bs, offset, &s->sectors[i]);
> > +        if (ret < 0) {
> > +            goto fail;
> > +        }
> > +        s->sectors[i] += ds->last_out_offset;
> > +        offset += 8;
> > +
> > +        ret = read_uint64(bs, offset, &s->sectorcounts[i]);
> > +        if (ret < 0) {
> > +            goto fail;
> > +        }
> > +        offset += 8;
> > +
> > +        if (s->sectorcounts[i] > DMG_SECTORCOUNTS_MAX) {
> > +            error_report("sector count %" PRIu64 " for chunk %" PRIu32
> > +                         " is larger than max (%u)",
> > +                         s->sectorcounts[i], i, DMG_SECTORCOUNTS_MAX);
> > +            ret = -EINVAL;
> > +            goto fail;
> > +        }
> > +
> > +        ret = read_uint64(bs, offset, &s->offsets[i]);
> > +        if (ret < 0) {
> > +            goto fail;
> > +        }
> > +        s->offsets[i] += ds->last_in_offset;
> > +        offset += 8;
> > +
> > +        ret = read_uint64(bs, offset, &s->lengths[i]);
> > +        if (ret < 0) {
> > +            goto fail;
> > +        }
> > +        offset += 8;
> > +
> > +        if (s->lengths[i] > DMG_LENGTHS_MAX) {
> > +            error_report("length %" PRIu64 " for chunk %" PRIu32
> > +                         " is larger than max (%u)",
> > +                         s->lengths[i], i, DMG_LENGTHS_MAX);
> > +            ret = -EINVAL;
> > +            goto fail;
> > +        }
> > +
> > +        update_max_chunk_size(s, i, &ds->max_compressed_size,
> > +                              &ds->max_sectors_per_chunk);
> > +    }
> > +    s->n_chunks += chunk_count;
> > +    return 0;
> > +
> > +fail:
> > +    return ret;
> > +}
> > +
> >   static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
> >                       Error **errp)
> >   {
> >       BDRVDMGState *s = bs->opaque;
> > -    uint64_t info_begin, info_end, last_in_offset, last_out_offset;
> > +    DmgHeaderState ds;
> > +    uint64_t info_begin, info_end;
> >       uint32_t count, tmp;
> > -    uint32_t max_compressed_size = 1, max_sectors_per_chunk = 1, i;
> >       int64_t offset;
> >       int ret;
> >
> >       bs->read_only = 1;
> >       s->n_chunks = 0;
> >       s->offsets = s->lengths = s->sectors = s->sectorcounts = NULL;
> > +    ds.last_in_offset = 0;
> > +    ds.last_out_offset = 0;
> > +    ds.max_compressed_size = 1;
> > +    ds.max_sectors_per_chunk = 1;
> 
> It might be nice to have a tiny initializer macro/function so that 
> future users of functions that use this structure don't need to know the 
> magic numbers.

This state is internal to the dmg_read_mish_block functionality for
which the dmg_open function is the only caller. I consider it similar to
the initialization for BDRVDMGState in the lines above so an additional
indirection to initialize DmgHeaderState does not seem warranted. In the
v2 patch a minor comment is added before this block.

The magic number 1 existed before and seem to be in-place to avoid
allocating a 0 size (for max_sectors_per_chunk). Why max_compressed_size
is 1 is beyond me though, but I copied this value from the previous
version.

> >       /* locate the UDIF trailer */
> >       offset = dmg_find_koly_offset(bs->file);
> > @@ -210,13 +328,11 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
> >       }
> >       info_end = info_begin + count;
> >
> > +    /* begin of mish block */
> >       offset = info_begin + 0x100;
> >
> >       /* read offsets */
> > -    last_in_offset = last_out_offset = 0;
> >       while (offset < info_end) {
> > -        uint32_t type;
> > -
> >           ret = read_uint32(bs, offset, &count);
> >           if (ret < 0) {
> >               goto fail;
> > @@ -226,100 +342,18 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
> >           }
> >           offset += 4;
> >
> > -        ret = read_uint32(bs, offset, &type);
> > +        ret = dmg_read_mish_block(bs, &ds, offset, count);
> >           if (ret < 0) {
> >               goto fail;
> >           }
> > -
> > -        if (type == 0x6d697368 && count >= 244) {
> > -            size_t new_size;
> > -            uint32_t chunk_count;
> > -
> > -            offset += 4;
> > -            offset += 200;
> > -
> > -            chunk_count = (count - 204) / 40;
> > -            new_size = sizeof(uint64_t) * (s->n_chunks + chunk_count);
> > -            s->types = g_realloc(s->types, new_size / 2);
> > -            s->offsets = g_realloc(s->offsets, new_size);
> > -            s->lengths = g_realloc(s->lengths, new_size);
> > -            s->sectors = g_realloc(s->sectors, new_size);
> > -            s->sectorcounts = g_realloc(s->sectorcounts, new_size);
> > -
> > -            for (i = s->n_chunks; i < s->n_chunks + chunk_count; i++) {
> > -                ret = read_uint32(bs, offset, &s->types[i]);
> > -                if (ret < 0) {
> > -                    goto fail;
> > -                }
> > -                offset += 4;
> > -                if (s->types[i] != 0x80000005 && s->types[i] != 1 &&
> > -                    s->types[i] != 2) {
> > -                    if (s->types[i] == 0xffffffff && i > 0) {
> > -                        last_in_offset = s->offsets[i - 1] + s->lengths[i - 1];
> > -                        last_out_offset = s->sectors[i - 1] +
> > -                                          s->sectorcounts[i - 1];
> > -                    }
> > -                    chunk_count--;
> > -                    i--;
> > -                    offset += 36;
> > -                    continue;
> > -                }
> > -                offset += 4;
> > -
> > -                ret = read_uint64(bs, offset, &s->sectors[i]);
> > -                if (ret < 0) {
> > -                    goto fail;
> > -                }
> > -                s->sectors[i] += last_out_offset;
> > -                offset += 8;
> > -
> > -                ret = read_uint64(bs, offset, &s->sectorcounts[i]);
> > -                if (ret < 0) {
> > -                    goto fail;
> > -                }
> > -                offset += 8;
> > -
> > -                if (s->sectorcounts[i] > DMG_SECTORCOUNTS_MAX) {
> > -                    error_report("sector count %" PRIu64 " for chunk %" PRIu32
> > -                                 " is larger than max (%u)",
> > -                                 s->sectorcounts[i], i, DMG_SECTORCOUNTS_MAX);
> > -                    ret = -EINVAL;
> > -                    goto fail;
> > -                }
> > -
> > -                ret = read_uint64(bs, offset, &s->offsets[i]);
> > -                if (ret < 0) {
> > -                    goto fail;
> > -                }
> > -                s->offsets[i] += last_in_offset;
> > -                offset += 8;
> > -
> > -                ret = read_uint64(bs, offset, &s->lengths[i]);
> > -                if (ret < 0) {
> > -                    goto fail;
> > -                }
> > -                offset += 8;
> > -
> > -                if (s->lengths[i] > DMG_LENGTHS_MAX) {
> > -                    error_report("length %" PRIu64 " for chunk %" PRIu32
> > -                                 " is larger than max (%u)",
> > -                                 s->lengths[i], i, DMG_LENGTHS_MAX);
> > -                    ret = -EINVAL;
> > -                    goto fail;
> > -                }
> > -
> > -                update_max_chunk_size(s, i, &max_compressed_size,
> > -                                      &max_sectors_per_chunk);
> > -            }
> > -            s->n_chunks += chunk_count;
> > -        }
> > +        offset += count;
> 
> A little justification here might be nice: I assume the "count" always 
> includes the full byte count of the mish block, and the old code just 
> incremented the offset up to this amount. This is the only functional 
> difference in this refactor that I can see.

This code got me confused again and after looking at this document[1], I
have added some explanatory comments for the next version.

The "count" in the loop refers to the contents of a single resource.
Basically we have this:

Resource header
 - 4 bytes rsrc data offset (assume 0x100)
 - 4 bytes rsrc map offset (ignored)
 - 4 bytes rsrc data length
 - 4 bytes rsrc map length (ignored)

The block of length "rsrc data length" at offset 0x100 consists of one
or more resources, each described by:
 - 4 bytes resource length
 - "resource length" number of bytes. Its contents is a "mish" block for
   our purposes.

Indeed, in this patch I avoided the need to copy from the mish block
parser as it is not really necessary given that the resource length is
known.

 [1]: https://developer.apple.com/legacy/library/documentation/mac/pdf/MoreMacintoshToolbox.pdf#page=151

> >       }
> >
> >       /* initialize zlib engine */
> >       s->compressed_chunk = qemu_try_blockalign(bs->file,
> > -                                              max_compressed_size + 1);
> > +                                              ds.max_compressed_size + 1);
> >       s->uncompressed_chunk = qemu_try_blockalign(bs->file,
> > -                                                512 * max_sectors_per_chunk);
> > +                                                512 * ds.max_sectors_per_chunk);
> >       if (s->compressed_chunk == NULL || s->uncompressed_chunk == NULL) {
> >           ret = -ENOMEM;
> >           goto fail;
> >
> 
> Thank you for splitting this monolithic function up into nicer pieces. 
> With or without addressing my comments above:
> 
> Reviewed-by: John Snow <jsnow@redhat.com>

Since I have only added some comments and renamed a variable, I'll carry
this R-b to the v2 patch.
-- 
Kind regards,
Peter
https://lekensteyn.nl

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 03/10] block/dmg: extract processing of resource forks
  2015-01-03  0:01   ` John Snow
@ 2015-01-03 11:24     ` Peter Wu
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Wu @ 2015-01-03 11:24 UTC (permalink / raw)
  To: John Snow; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

On Friday 02 January 2015 19:01:06 John Snow wrote:
> On 12/27/2014 10:01 AM, Peter Wu wrote:
> > Besides the offset, also read the resource length. This length is now
> > used in the extracted function to verify the end of the resource fork
> > against "count" from the resource fork.
> >
> > Signed-off-by: Peter Wu <peter@lekensteyn.nl>
> > ---
> >   block/dmg.c | 90 ++++++++++++++++++++++++++++++++++++++++---------------------
> >   1 file changed, 59 insertions(+), 31 deletions(-)
> >
> > diff --git a/block/dmg.c b/block/dmg.c
> > index 6dc6dbb..7f49388 100644
> > --- a/block/dmg.c
> > +++ b/block/dmg.c
> > @@ -278,38 +278,13 @@ fail:
> >       return ret;
> >   }
> >
> > -static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
> > -                    Error **errp)
> > +static int dmg_read_resource_fork(BlockDriverState *bs, DmgHeaderState *ds,
> > +                                  uint64_t info_begin, uint64_t info_length)
> >   {
> > -    BDRVDMGState *s = bs->opaque;
> > -    DmgHeaderState ds;
> > -    uint64_t info_begin, info_end;
> > -    uint32_t count, tmp;
> > -    int64_t offset;
> >       int ret;
> > -
> > -    bs->read_only = 1;
> > -    s->n_chunks = 0;
> > -    s->offsets = s->lengths = s->sectors = s->sectorcounts = NULL;
> > -    ds.last_in_offset = 0;
> > -    ds.last_out_offset = 0;
> > -    ds.max_compressed_size = 1;
> > -    ds.max_sectors_per_chunk = 1;
> > -
> > -    /* locate the UDIF trailer */
> > -    offset = dmg_find_koly_offset(bs->file);
> > -    if (offset < 0) {
> > -        ret = offset;
> > -        goto fail;
> > -    }
> > -
> > -    ret = read_uint64(bs, offset + 0x28, &info_begin);
> > -    if (ret < 0) {
> > -        goto fail;
> > -    } else if (info_begin == 0) {
> > -        ret = -EINVAL;
> > -        goto fail;
> > -    }
> > +    uint32_t count, tmp;
> > +    uint64_t info_end;
> > +    uint64_t offset;
> >
> >       ret = read_uint32(bs, info_begin, &tmp);
> >       if (ret < 0) {
> > @@ -326,6 +301,10 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
> >           ret = -EINVAL;
> >           goto fail;
> >       }
> > +    if (count > info_length) {
> > +        ret = -EINVAL;
> > +        goto fail;
> > +    }
> >       info_end = info_begin + count;
> >
> >       /* begin of mish block */
> > @@ -342,12 +321,61 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
> >           }
> >           offset += 4;
> >
> > -        ret = dmg_read_mish_block(bs, &ds, offset, count);
> > +        ret = dmg_read_mish_block(bs, ds, offset, count);
> >           if (ret < 0) {
> >               goto fail;
> >           }
> >           offset += count;
> >       }
> > +    return 0;
> > +
> > +fail:
> > +    return ret;
> > +}
> > +
> > +static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
> > +                    Error **errp)
> > +{
> > +    BDRVDMGState *s = bs->opaque;
> > +    DmgHeaderState ds;
> > +    uint64_t rsrc_fork_offset, rsrc_fork_length;
> > +    int64_t offset;
> > +    int ret;
> > +
> > +    bs->read_only = 1;
> > +    s->n_chunks = 0;
> > +    s->offsets = s->lengths = s->sectors = s->sectorcounts = NULL;
> > +    ds.last_in_offset = 0;
> > +    ds.last_out_offset = 0;
> > +    ds.max_compressed_size = 1;
> > +    ds.max_sectors_per_chunk = 1;
> > +
> > +    /* locate the UDIF trailer */
> > +    offset = dmg_find_koly_offset(bs->file);
> > +    if (offset < 0) {
> > +        ret = offset;
> > +        goto fail;
> > +    }
> > +
> > +    /* offset of resource fork (RsrcForkOffset) */
> > +    ret = read_uint64(bs, offset + 0x28, &rsrc_fork_offset);
> > +    if (ret < 0) {
> > +        goto fail;
> > +    }
> > +    ret = read_uint64(bs, offset + 0x30, &rsrc_fork_length);
> > +    if (ret < 0) {
> > +        goto fail;
> > +    }
> > +    if (rsrc_fork_offset != 0 && rsrc_fork_length != 0) {
> > +        ret = dmg_read_resource_fork(bs, &ds,
> > +                                     rsrc_fork_offset, rsrc_fork_length);
> > +        if (ret < 0) {
> > +            goto fail;
> > +        }
> > +    } else {
> > +        ret = -EINVAL;
> > +        goto fail;
> > +    }
> >
> >       /* initialize zlib engine */
> >       s->compressed_chunk = qemu_try_blockalign(bs->file,
> >
> 
> I know your refactor hardly touches the code here, but this is a good 
> place to mention it:
> 
>  From what I can tell from the Resource Fork format
> https://developer.apple.com/legacy/library/documentation/mac/pdf/MoreMacintoshToolbox.pdf#page=151
> 
> there are four fields of interest in the header of the resourcefork 
> before we start looping trying to find mish data:
> 
> uint32_t data_offset;
> uint32_t map_offset;
> uint32_t data_length;
> uint32_t map_length;
> 
> We are treating the map which comes right after the data as the "data 
> length" which is not too far from the truth, but may include extra bytes.

The rsrc_fork_offset and rsrc_fork_length values above come from the
trailer, not the resource fork... but now that I read the new function I
see what you mean here. The offset '4' should be '8'. I'll fix this in
v2.

> e.g., in the vlc dmg you linked:
> 
> The koly header advertises a Resource Fork length of 93276 bytes.
> the resource fork itself has a header of:
> 
> data_offset := 256
> map_offset := 92860	// not length!
> data_length := 92604
> map_length := 416
> 
> 
> Since we are naming this function explicitly after the Resource Fork, we 
> should probably tidy this up just a little bit to be clearer at the very 
> least.
> 
> The space reserved for system use, here asserted by QEMU to always be 
> 0x100, is technically variable and perhaps could be adjusted as well 
> with minimal changes.

This artificial restriction looks unnecessary even when there are no
other files in practice. I have replaced it by a saner check for v2.

> A comment pointing to this documentation for the format here would also 
> be helpful, as your other source details koly and mish, but Resource 
> Fork was missing.

I'll mention this in the commit message, thanks!
-- 
Kind regards,
Peter
https://lekensteyn.nl

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 06/10] block/dmg: process XML plists
  2015-01-03  0:04   ` John Snow
@ 2015-01-03 11:54     ` Peter Wu
  2015-01-05 16:46       ` John Snow
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Wu @ 2015-01-03 11:54 UTC (permalink / raw)
  To: John Snow; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

On Friday 02 January 2015 19:04:32 John Snow wrote:
> On 12/27/2014 10:01 AM, Peter Wu wrote:
> > The format is simple enough to avoid using a full-blown XML parser.
> > The offsets are based on the description at
> > http://newosxbook.com/DMG.html
> >
> > Signed-off-by: Peter Wu <peter@lekensteyn.nl>
> > ---
> >   block/dmg.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >   1 file changed, 69 insertions(+)
> >
> > diff --git a/block/dmg.c b/block/dmg.c
> > index 19e4fe2..c03ea01 100644
> > --- a/block/dmg.c
> > +++ b/block/dmg.c
> > @@ -26,6 +26,7 @@
> >   #include "qemu/bswap.h"
> >   #include "qemu/module.h"
> >   #include <zlib.h>
> > +#include <glib.h>
> >
> >   enum {
> >       /* Limit chunk sizes to prevent unreasonable amounts of memory being used
> > @@ -333,12 +334,66 @@ fail:
> >       return ret;
> >   }
> >
> > +static int dmg_read_plist_xml(BlockDriverState *bs, DmgHeaderState *ds,
> > +                              uint64_t info_begin, uint64_t info_length)
> > +{
> > +    BDRVDMGState *s = bs->opaque;
> > +    int ret;
> > +    uint8_t *buffer = NULL;
> > +    char *data_begin, *data_end;
> > +
> > +    /* Have at least some length to avoid NULL for g_malloc. Attempt to set a
> > +     * safe upper cap on the data length. A test sample had a XML length of
> > +     * about 1 MiB. */
> > +    if (info_length == 0 || info_length > 16 * 1024 * 1024) {
> > +        ret = -EINVAL;
> > +        goto fail;
> > +    }
> > +
> > +    buffer = g_malloc(info_length + 1);
> > +    buffer[info_length] = '\0';
> > +    ret = bdrv_pread(bs->file, info_begin, buffer, info_length);
> > +    if (ret != info_length) {
> > +        ret = -EINVAL;
> > +        goto fail;
> > +    }
> > +
> > +    /* look for <data>...</data>. The data is 284 (0x11c) bytes after base64
> > +     * decode. The actual data element has 431 (0x1af) bytes which includes tabs
> > +     * and line feeds. */
> > +    data_end = (char *)buffer;
> > +    while ((data_begin = strstr(data_end, "<data>")) != NULL) {
> > +        gsize out_len = 0;
> > +
> > +        data_begin += 6;
> > +        data_end = strstr(data_begin, "</data>");
> > +        /* malformed XML? */
> > +        if (data_end == NULL) {
> > +            ret = -EINVAL;
> > +            goto fail;
> > +        }
> > +        *data_end++ = '\0';
> > +        g_base64_decode_inplace(data_begin, &out_len);
> > +        ret = dmg_read_mish_block(s, ds, (uint8_t *)data_begin,
> > +                                  (uint32_t)out_len);
> > +        if (ret < 0) {
> > +            goto fail;
> > +        }
> > +    }
> > +    ret = 0;
> > +
> > +fail:
> > +    g_free(buffer);
> > +    return ret;
> > +}
> > +
> 
> This starts to make me a little nervous, because we're ignoring so much 
> of the XML document structure here and just effectively performing a 
> regular search for "<data>(.*)</data>".
> 
> Can we guarantee that the ONLY time the data element is used in this 
> document is when it is being used in the exact context we are expecting 
> here, where it contains the b64 mish data we expect it to?
> 
> i.e. it is always in a path like this as detailed by 
> http://newosxbook.com/DMG.html :
> 
> plist/dict/key[text()='resource-fork']/following-sibling::dict/key[text()='blkx']/following-sibling::array/dict/key[text()='data']/following-sibling::data
> 
> I notice that this document says other sections MAY be present, do any 
> of them ever need to be parsed? Has anyone written about them before?
> 
> Do we know if any use data sections?
> 
> I suppose at the very least, sections of interest are always going to 
> include the "mish" magic, so that should probably keep us from doing 
> anything too stupid ...

I did not find DMG files with <data> elements at other locations. If it
would occur, at worst we would fail to parse a DMG file. I think that
introducing a XML parser here would introduce a risk for a minor benefit
(being prepared for future cases).

Since this is a property list, in theory people could include all kinds
of data for different keys (which would then be matched by the current
implementation). But how likely is this for a disk image?

FWIW, I looked into the dmg2img program and that also looks for the
strings "<data>" and "</data>". Nobody has raised a bug for that program
so far.

Do you think that it is worth to use a XML parser on potentially
insecure data? I suggest to keep it as it, and reconsider a different
approach in case a problem is encountered.

Kind regards,
Peter

> >   static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
> >                       Error **errp)
> >   {
> >       BDRVDMGState *s = bs->opaque;
> >       DmgHeaderState ds;
> >       uint64_t rsrc_fork_offset, rsrc_fork_length;
> > +    uint64_t plist_xml_offset, plist_xml_length;
> >       int64_t offset;
> >       int ret;
> >
> > @@ -366,12 +421,26 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
> >       if (ret < 0) {
> >           goto fail;
> >       }
> > +    /* offset of property list (XMLOffset) */
> > +    ret = read_uint64(bs, offset + 0xd8, &plist_xml_offset);
> > +    if (ret < 0) {
> > +        goto fail;
> > +    }
> > +    ret = read_uint64(bs, offset + 0xe0, &plist_xml_length);
> > +    if (ret < 0) {
> > +        goto fail;
> > +    }
> >       if (rsrc_fork_offset != 0 && rsrc_fork_length != 0) {
> >           ret = dmg_read_resource_fork(bs, &ds,
> >                                        rsrc_fork_offset, rsrc_fork_length);
> >           if (ret < 0) {
> >               goto fail;
> >           }
> > +    } else if (plist_xml_offset != 0 && plist_xml_length != 0) {
> > +        ret = dmg_read_plist_xml(bs, &ds, plist_xml_offset, plist_xml_length);
> > +        if (ret < 0) {
> > +            goto fail;
> > +        }
> >       } else {
> >           ret = -EINVAL;
> >           goto fail;
> >

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 08/10] block/dmg: fix sector data offset calculation
  2015-01-03  0:05   ` John Snow
@ 2015-01-03 12:47     ` Peter Wu
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Wu @ 2015-01-03 12:47 UTC (permalink / raw)
  To: John Snow; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

On Friday 02 January 2015 19:05:29 John Snow wrote:
> On 12/27/2014 10:01 AM, Peter Wu wrote:
> > This patch addresses two issues:
> >
> >   - The data fork offset was not taken into account, resulting in failure
> >     to read an InstallESD.dmg file (5164763151 bytes) which had a
> >     non-zero DataForkOffset field.
> >   - The offset of the previous block ("partition") was unconditionally
> >     added to the current block because older files would start the input
> >     offset of a new block at zero. Newer files (including vlc-2.1.5.dmg,
> >     tuxpaint-0.9.15-macosx.dmg and OS X Yosemite [MAS].dmg) failed in
> >     reads because these files have a continuous input offset.
> >
> 
> What does "continuous input offset" mean? This change is not as clear to 
> me, see below.

By "continuous" I mean that the new files have absolute offsets while
the offsets in older files were relative to the previous block.

> > Signed-off-by: Peter Wu <peter@lekensteyn.nl>
> > ---
> >   block/dmg.c | 16 +++++++++++++++-
> >   1 file changed, 15 insertions(+), 1 deletion(-)
> >
> > diff --git a/block/dmg.c b/block/dmg.c
> > index 984997f..93b597f 100644
> > --- a/block/dmg.c
> > +++ b/block/dmg.c
> > @@ -179,6 +179,7 @@ static int64_t dmg_find_koly_offset(BlockDriverState *file_bs)
> >   typedef struct DmgHeaderState {
> >       /* used internally by dmg_read_mish_block to remember offsets of blocks
> >        * across calls */
> > +    uint64_t data_fork_offset;
> >       uint64_t last_in_offset;
> >       uint64_t last_out_offset;
> >       /* exported for dmg_open */
> > @@ -194,6 +195,7 @@ static int dmg_read_mish_block(BDRVDMGState *s, DmgHeaderState *ds,
> >       size_t new_size;
> >       uint32_t chunk_count;
> >       int64_t offset = 0;
> > +    uint64_t in_offset = ds->data_fork_offset;
> >
> >       type = buff_read_uint32(buffer, offset);
> >       /* skip data that is not a valid MISH block (invalid magic or too small) */
> > @@ -246,7 +248,12 @@ static int dmg_read_mish_block(BDRVDMGState *s, DmgHeaderState *ds,
> >           }
> >
> >           s->offsets[i] = buff_read_uint64(buffer, offset);
> > -        s->offsets[i] += ds->last_in_offset;
> > +        /* If this offset is below the previous chunk end, then assume that all
> > +         * following offsets are after the previous chunks. */
> > +        if (s->offsets[i] + in_offset < ds->last_in_offset) {
> > +            in_offset = ds->last_in_offset;
> > +        }
> > +        s->offsets[i] += in_offset;
> 
> I take it that all of the offsets referenced in the mish structures are 
> relative to the start of the data fork block, which is why we are taking 
> a value from the koly block and applying it to mish block values.
> 
> correct?

Correct, the mish block describes the contents of the data fork.
http://newosxbook.com/DMG.html says:

typedef struct {
        // ...
        uint64_t CompressedOffset;  // Start of chunk in data fork
        uint64_t CompressedLength;  // Count of bytes of chunk, in data fork
} __attribute__((__packed__)) BLKXChunkEntry;

> >           offset += 8;
> >
> >           s->lengths[i] = buff_read_uint64(buffer, offset);
> > @@ -400,6 +407,7 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
> >       bs->read_only = 1;
> >       s->n_chunks = 0;
> >       s->offsets = s->lengths = s->sectors = s->sectorcounts = NULL;
> > +    ds.data_fork_offset = 0;
> >       ds.last_in_offset = 0;
> >       ds.last_out_offset = 0;
> >       ds.max_compressed_size = 1;
> > @@ -412,6 +420,12 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
> >           goto fail;
> >       }
> >
> > +    /* offset of data fork (DataForkOffset) */
> > +    ret = read_uint64(bs, offset + 0x18, &ds.data_fork_offset);
> > +    if (ret < 0) {
> > +        goto fail;
> > +    }
> > +
> >       /* offset of resource fork (RsrcForkOffset) */
> >       ret = read_uint64(bs, offset + 0x28, &rsrc_fork_offset);
> >       if (ret < 0) {
> >
> 
> A general question here:
> 
> Are we ever reading the preamble of the mish block? I see we are reading 
> the 'n' items of 40-byte chunk data, but is there a reason we skip the 
> first 200 bytes of mish data, or have I misread the documents on DMG 
> that exist?

We only use the Signature field to verify that we indeed have a BLKX
entry (required since the XML fork may yield other kind of data which
does not have this magic).

The UDIFChecksum field is 136 bytes (confirmed by a internet search).
Adding the other fields (version..reserved6 and NumberOfBlockChunks)
results in 200 (+4 for the Signature).

> It looks like there are some good fields here: SectorNumber, 
> SectorCount, DataOffset, and BlockDescriptors -- can these not be used 
> to provide a more explicit error-checking of offsets, allowing us to 
> make less assumptions about where these blocks begin and end?
> 
> Is there some reason they are unreliable?

As far as I know this is not checked because nobody added it. I am not
aware of incorrect values inside this block. Let's see:

 - Version: could check this against '1' and fail if unknown?
 - SectorNumber: looks like this can be taken as the absolute offset,
   all entries seem to be relative to this one.
 - SectorCount: looks like the number of sectors which should match the
   entries (at least for the tuxpaint example dmg file).
 - DataOffset: 0 in the tuxpaint example. Perhaps it should be added to
   the data fork offset, but let's ignore it for now.
 - 0x208 for 3/4 mish blocks, 0 for the last mish block which does not
   seem to contain data.
 - NumberOfBlockChunks: 0xFFFFFFFF, 0, 1, 2 respectively for the mish
   blocks. No idea what this means. Probably the ordering, but we assume
   that the offsets are sorted AFAIK.

I will try to make use of SectorNumber and SectorCount, the others will
be ignored for now.
-- 
Kind regards,
Peter
https://lekensteyn.nl

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 06/10] block/dmg: process XML plists
  2015-01-03 11:54     ` Peter Wu
@ 2015-01-05 16:46       ` John Snow
  0 siblings, 0 replies; 39+ messages in thread
From: John Snow @ 2015-01-05 16:46 UTC (permalink / raw)
  To: Peter Wu; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi



On 01/03/2015 06:54 AM, Peter Wu wrote:
> On Friday 02 January 2015 19:04:32 John Snow wrote:
>> On 12/27/2014 10:01 AM, Peter Wu wrote:
>>> The format is simple enough to avoid using a full-blown XML parser.
>>> The offsets are based on the description at
>>> http://newosxbook.com/DMG.html
>>>
>>> Signed-off-by: Peter Wu <peter@lekensteyn.nl>
>>> ---
>>>    block/dmg.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>    1 file changed, 69 insertions(+)
>>>
>>> diff --git a/block/dmg.c b/block/dmg.c
>>> index 19e4fe2..c03ea01 100644
>>> --- a/block/dmg.c
>>> +++ b/block/dmg.c
>>> @@ -26,6 +26,7 @@
>>>    #include "qemu/bswap.h"
>>>    #include "qemu/module.h"
>>>    #include <zlib.h>
>>> +#include <glib.h>
>>>
>>>    enum {
>>>        /* Limit chunk sizes to prevent unreasonable amounts of memory being used
>>> @@ -333,12 +334,66 @@ fail:
>>>        return ret;
>>>    }
>>>
>>> +static int dmg_read_plist_xml(BlockDriverState *bs, DmgHeaderState *ds,
>>> +                              uint64_t info_begin, uint64_t info_length)
>>> +{
>>> +    BDRVDMGState *s = bs->opaque;
>>> +    int ret;
>>> +    uint8_t *buffer = NULL;
>>> +    char *data_begin, *data_end;
>>> +
>>> +    /* Have at least some length to avoid NULL for g_malloc. Attempt to set a
>>> +     * safe upper cap on the data length. A test sample had a XML length of
>>> +     * about 1 MiB. */
>>> +    if (info_length == 0 || info_length > 16 * 1024 * 1024) {
>>> +        ret = -EINVAL;
>>> +        goto fail;
>>> +    }
>>> +
>>> +    buffer = g_malloc(info_length + 1);
>>> +    buffer[info_length] = '\0';
>>> +    ret = bdrv_pread(bs->file, info_begin, buffer, info_length);
>>> +    if (ret != info_length) {
>>> +        ret = -EINVAL;
>>> +        goto fail;
>>> +    }
>>> +
>>> +    /* look for <data>...</data>. The data is 284 (0x11c) bytes after base64
>>> +     * decode. The actual data element has 431 (0x1af) bytes which includes tabs
>>> +     * and line feeds. */
>>> +    data_end = (char *)buffer;
>>> +    while ((data_begin = strstr(data_end, "<data>")) != NULL) {
>>> +        gsize out_len = 0;
>>> +
>>> +        data_begin += 6;
>>> +        data_end = strstr(data_begin, "</data>");
>>> +        /* malformed XML? */
>>> +        if (data_end == NULL) {
>>> +            ret = -EINVAL;
>>> +            goto fail;
>>> +        }
>>> +        *data_end++ = '\0';
>>> +        g_base64_decode_inplace(data_begin, &out_len);
>>> +        ret = dmg_read_mish_block(s, ds, (uint8_t *)data_begin,
>>> +                                  (uint32_t)out_len);
>>> +        if (ret < 0) {
>>> +            goto fail;
>>> +        }
>>> +    }
>>> +    ret = 0;
>>> +
>>> +fail:
>>> +    g_free(buffer);
>>> +    return ret;
>>> +}
>>> +
>>
>> This starts to make me a little nervous, because we're ignoring so much
>> of the XML document structure here and just effectively performing a
>> regular search for "<data>(.*)</data>".
>>
>> Can we guarantee that the ONLY time the data element is used in this
>> document is when it is being used in the exact context we are expecting
>> here, where it contains the b64 mish data we expect it to?
>>
>> i.e. it is always in a path like this as detailed by
>> http://newosxbook.com/DMG.html :
>>
>> plist/dict/key[text()='resource-fork']/following-sibling::dict/key[text()='blkx']/following-sibling::array/dict/key[text()='data']/following-sibling::data
>>
>> I notice that this document says other sections MAY be present, do any
>> of them ever need to be parsed? Has anyone written about them before?
>>
>> Do we know if any use data sections?
>>
>> I suppose at the very least, sections of interest are always going to
>> include the "mish" magic, so that should probably keep us from doing
>> anything too stupid ...
>
> I did not find DMG files with <data> elements at other locations. If it
> would occur, at worst we would fail to parse a DMG file. I think that
> introducing a XML parser here would introduce a risk for a minor benefit
> (being prepared for future cases).
>
> Since this is a property list, in theory people could include all kinds
> of data for different keys (which would then be matched by the current
> implementation). But how likely is this for a disk image?
>
> FWIW, I looked into the dmg2img program and that also looks for the
> strings "<data>" and "</data>". Nobody has raised a bug for that program
> so far.
>
> Do you think that it is worth to use a XML parser on potentially
> insecure data? I suggest to keep it as it, and reconsider a different
> approach in case a problem is encountered.
>
> Kind regards,
> Peter

No: I was just asking the questions. If dmg2img gets away with it, the 
worst that will happen is we will fail to parse/load a DMG file because 
we ignore everything without the "mish" magic, so this is OK.

I just wanted to check, since I didn't have a lot of DMG files on-hand 
and I couldn't really find a fuller reference to the types of XML that 
shows up.

Thanks!

>>>    static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
>>>                        Error **errp)
>>>    {
>>>        BDRVDMGState *s = bs->opaque;
>>>        DmgHeaderState ds;
>>>        uint64_t rsrc_fork_offset, rsrc_fork_length;
>>> +    uint64_t plist_xml_offset, plist_xml_length;
>>>        int64_t offset;
>>>        int ret;
>>>
>>> @@ -366,12 +421,26 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
>>>        if (ret < 0) {
>>>            goto fail;
>>>        }
>>> +    /* offset of property list (XMLOffset) */
>>> +    ret = read_uint64(bs, offset + 0xd8, &plist_xml_offset);
>>> +    if (ret < 0) {
>>> +        goto fail;
>>> +    }
>>> +    ret = read_uint64(bs, offset + 0xe0, &plist_xml_length);
>>> +    if (ret < 0) {
>>> +        goto fail;
>>> +    }
>>>        if (rsrc_fork_offset != 0 && rsrc_fork_length != 0) {
>>>            ret = dmg_read_resource_fork(bs, &ds,
>>>                                         rsrc_fork_offset, rsrc_fork_length);
>>>            if (ret < 0) {
>>>                goto fail;
>>>            }
>>> +    } else if (plist_xml_offset != 0 && plist_xml_length != 0) {
>>> +        ret = dmg_read_plist_xml(bs, &ds, plist_xml_offset, plist_xml_length);
>>> +        if (ret < 0) {
>>> +            goto fail;
>>> +        }
>>>        } else {
>>>            ret = -EINVAL;
>>>            goto fail;
>>>
>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 06/10] block/dmg: process XML plists
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 06/10] block/dmg: process XML plists Peter Wu
  2015-01-03  0:04   ` John Snow
@ 2015-01-05 16:54   ` John Snow
  1 sibling, 0 replies; 39+ messages in thread
From: John Snow @ 2015-01-05 16:54 UTC (permalink / raw)
  To: Peter Wu, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi



On 12/27/2014 10:01 AM, Peter Wu wrote:
> The format is simple enough to avoid using a full-blown XML parser.
> The offsets are based on the description at
> http://newosxbook.com/DMG.html
>
> Signed-off-by: Peter Wu <peter@lekensteyn.nl>
> ---
>   block/dmg.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>   1 file changed, 69 insertions(+)
>
> diff --git a/block/dmg.c b/block/dmg.c
> index 19e4fe2..c03ea01 100644
> --- a/block/dmg.c
> +++ b/block/dmg.c
> @@ -26,6 +26,7 @@
>   #include "qemu/bswap.h"
>   #include "qemu/module.h"
>   #include <zlib.h>
> +#include <glib.h>
>
>   enum {
>       /* Limit chunk sizes to prevent unreasonable amounts of memory being used
> @@ -333,12 +334,66 @@ fail:
>       return ret;
>   }
>
> +static int dmg_read_plist_xml(BlockDriverState *bs, DmgHeaderState *ds,
> +                              uint64_t info_begin, uint64_t info_length)
> +{
> +    BDRVDMGState *s = bs->opaque;
> +    int ret;
> +    uint8_t *buffer = NULL;
> +    char *data_begin, *data_end;
> +
> +    /* Have at least some length to avoid NULL for g_malloc. Attempt to set a
> +     * safe upper cap on the data length. A test sample had a XML length of
> +     * about 1 MiB. */
> +    if (info_length == 0 || info_length > 16 * 1024 * 1024) {
> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +
> +    buffer = g_malloc(info_length + 1);
> +    buffer[info_length] = '\0';
> +    ret = bdrv_pread(bs->file, info_begin, buffer, info_length);
> +    if (ret != info_length) {
> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +
> +    /* look for <data>...</data>. The data is 284 (0x11c) bytes after base64
> +     * decode. The actual data element has 431 (0x1af) bytes which includes tabs
> +     * and line feeds. */
> +    data_end = (char *)buffer;
> +    while ((data_begin = strstr(data_end, "<data>")) != NULL) {
> +        gsize out_len = 0;
> +
> +        data_begin += 6;
> +        data_end = strstr(data_begin, "</data>");
> +        /* malformed XML? */
> +        if (data_end == NULL) {
> +            ret = -EINVAL;
> +            goto fail;
> +        }
> +        *data_end++ = '\0';
> +        g_base64_decode_inplace(data_begin, &out_len);
> +        ret = dmg_read_mish_block(s, ds, (uint8_t *)data_begin,
> +                                  (uint32_t)out_len);
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +    }
> +    ret = 0;
> +
> +fail:
> +    g_free(buffer);
> +    return ret;
> +}
> +
>   static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
>                       Error **errp)
>   {
>       BDRVDMGState *s = bs->opaque;
>       DmgHeaderState ds;
>       uint64_t rsrc_fork_offset, rsrc_fork_length;
> +    uint64_t plist_xml_offset, plist_xml_length;
>       int64_t offset;
>       int ret;
>
> @@ -366,12 +421,26 @@ static int dmg_open(BlockDriverState *bs, QDict *options, int flags,
>       if (ret < 0) {
>           goto fail;
>       }
> +    /* offset of property list (XMLOffset) */
> +    ret = read_uint64(bs, offset + 0xd8, &plist_xml_offset);
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +    ret = read_uint64(bs, offset + 0xe0, &plist_xml_length);
> +    if (ret < 0) {
> +        goto fail;
> +    }
>       if (rsrc_fork_offset != 0 && rsrc_fork_length != 0) {
>           ret = dmg_read_resource_fork(bs, &ds,
>                                        rsrc_fork_offset, rsrc_fork_length);
>           if (ret < 0) {
>               goto fail;
>           }
> +    } else if (plist_xml_offset != 0 && plist_xml_length != 0) {
> +        ret = dmg_read_plist_xml(bs, &ds, plist_xml_offset, plist_xml_length);
> +        if (ret < 0) {
> +            goto fail;
> +        }
>       } else {
>           ret = -EINVAL;
>           goto fail;
>

After discussion, and because the dmg2img program uses a similar 
strategy, this is probably safe enough -- failures do not seem likely 
and if they occur, we will simply ignore the erroneous data.

We can complicate this in the future if we need to, as stated.

Reviewed-by: John Snow <jsnow@redhat.com>

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 09/10] block/dmg: support bzip2 block entry types
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 09/10] block/dmg: support bzip2 block entry types Peter Wu
@ 2015-01-05 19:32   ` John Snow
  2015-01-07 10:29     ` Paolo Bonzini
  0 siblings, 1 reply; 39+ messages in thread
From: John Snow @ 2015-01-05 19:32 UTC (permalink / raw)
  To: Peter Wu, qemu-devel
  Cc: Kevin Wolf, pbonzini >> Paolo Bonzini, Stefan Hajnoczi



On 12/27/2014 10:01 AM, Peter Wu wrote:
> This patch adds support for bzip2-compressed block entries as introduced
> with OS X 10.4 (source: https://en.wikipedia.org/wiki/Apple_Disk_Image).
>
> It was tested against a 5.2G "OS X Yosemite" installation image which
> stores the BLXX block in the XML property list (instead of resource
> forks) and has over 5k chunks.
>
> New configure entries are added (--enable-bzip2 / --disable-bzip2) to
> control inclusion of bzip2 functionality (which requires linking against
> libbz2). The help message suggests that this option is needed for DMG
> files, but the tests are generic enough that other parts of QEMU can use
> bzip2 if needed.
>
> The identifiers are based on http://newosxbook.com/DMG.html.
>
> The decompression routines are based on the zlib case, but as there is
> no way to reset the decompression state (unlike zlib), memory is
> allocated and deallocated for every decompression. This should not be
> problematic as the decompression takes most of the time and as blocks
> are typically about/over 1 MiB in size, only one allocation is done
> every 2000 sectors.
>
> Signed-off-by: Peter Wu <peter@lekensteyn.nl>
> ---
>   block/dmg.c | 56 +++++++++++++++++++++++++++++++++++++++++++++++++++++---
>   configure   | 31 +++++++++++++++++++++++++++++++
>   2 files changed, 84 insertions(+), 3 deletions(-)
>
> diff --git a/block/dmg.c b/block/dmg.c
> index 93b597f..67d4e2b 100644
> --- a/block/dmg.c
> +++ b/block/dmg.c
> @@ -26,6 +26,9 @@
>   #include "qemu/bswap.h"
>   #include "qemu/module.h"
>   #include <zlib.h>
> +#ifdef CONFIG_BZIP2
> +#include <bzlib.h>
> +#endif
>   #include <glib.h>
>
>   enum {
> @@ -56,6 +59,9 @@ typedef struct BDRVDMGState {
>       uint8_t *compressed_chunk;
>       uint8_t *uncompressed_chunk;
>       z_stream zstream;
> +#ifdef CONFIG_BZIP2
> +    bz_stream bzstream;
> +#endif
>   } BDRVDMGState;
>
>   static int dmg_probe(const uint8_t *buf, int buf_size, const char *filename)
> @@ -123,6 +129,7 @@ static void update_max_chunk_size(BDRVDMGState *s, uint32_t chunk,
>
>       switch (s->types[chunk]) {
>       case 0x80000005: /* zlib compressed */
> +    case 0x80000006: /* bzip2 compressed */
>           compressed_size = s->lengths[chunk];
>           uncompressed_sectors = s->sectorcounts[chunk];
>           break;
> @@ -187,6 +194,21 @@ typedef struct DmgHeaderState {
>       uint32_t max_sectors_per_chunk;
>   } DmgHeaderState;
>
> +static bool dmg_is_valid_block_type(uint32_t entry_type)
> +{
> +    switch (entry_type) {
> +    case 0x00000001:    /* uncompressed */
> +    case 0x00000002:    /* zeroes */
> +    case 0x80000005:    /* zlib */
> +#ifdef CONFIG_BZIP2
> +    case 0x80000006:    /* bzip2 */
> +#endif
> +        return true;
> +    default:
> +        return false;
> +    }
> +}
> +

Wish I had read this patch first before trying to read the blob of 
conditionals this replaces. :)

>   static int dmg_read_mish_block(BDRVDMGState *s, DmgHeaderState *ds,
>                                  uint8_t *buffer, uint32_t count)
>   {
> @@ -218,8 +240,7 @@ static int dmg_read_mish_block(BDRVDMGState *s, DmgHeaderState *ds,
>       for (i = s->n_chunks; i < s->n_chunks + chunk_count; i++) {
>           s->types[i] = buff_read_uint32(buffer, offset);
>           offset += 4;
> -        if (s->types[i] != 0x80000005 && s->types[i] != 1 &&
> -            s->types[i] != 2) {
> +        if (!dmg_is_valid_block_type(s->types[i])) {
>               if (s->types[i] == 0xffffffff && i > 0) {
>                   ds->last_in_offset = s->offsets[i - 1] + s->lengths[i - 1];
>                   ds->last_out_offset = s->sectors[i - 1] +
> @@ -534,13 +555,14 @@ static inline int dmg_read_chunk(BlockDriverState *bs, uint64_t sector_num)
>       if (!is_sector_in_chunk(s, s->current_chunk, sector_num)) {
>           int ret;
>           uint32_t chunk = search_chunk(s, sector_num);
> +        uint64_t total_out;

This breaks compilation when configured with --disable-bzip2. (unused 
variable)

>
>           if (chunk >= s->n_chunks) {
>               return -1;
>           }
>
>           s->current_chunk = s->n_chunks;
> -        switch (s->types[chunk]) {
> +        switch (s->types[chunk]) { /* block entry type */
>           case 0x80000005: { /* zlib compressed */
>               /* we need to buffer, because only the chunk as whole can be
>                * inflated. */
> @@ -564,6 +586,34 @@ static inline int dmg_read_chunk(BlockDriverState *bs, uint64_t sector_num)
>                   return -1;
>               }
>               break; }
> +#ifdef CONFIG_BZIP2
> +        case 0x80000006: /* bzip2 compressed */
> +            /* we need to buffer, because only the chunk as whole can be
> +             * inflated. */
> +            ret = bdrv_pread(bs->file, s->offsets[chunk],
> +                             s->compressed_chunk, s->lengths[chunk]);
> +            if (ret != s->lengths[chunk]) {
> +                return -1;
> +            }
> +
> +            ret = BZ2_bzDecompressInit(&s->bzstream, 0, 0);
> +            if (ret != BZ_OK) {
> +                return -1;
> +            }
> +            s->bzstream.next_in = (char *)s->compressed_chunk;
> +            s->bzstream.avail_in = (unsigned int) s->lengths[chunk];
> +            s->bzstream.next_out = (char *)s->uncompressed_chunk;
> +            s->bzstream.avail_out = (unsigned int) 512 * s->sectorcounts[chunk];
> +            ret = BZ2_bzDecompress(&s->bzstream);
> +            total_out = ((uint64_t)s->bzstream.total_out_hi32 << 32) +
> +                        s->bzstream.total_out_lo32;
> +            BZ2_bzDecompressEnd(&s->bzstream);
> +            if (ret != BZ_STREAM_END ||
> +                total_out != 512 * s->sectorcounts[chunk]) {
> +                return -1;
> +            }
> +            break;
> +#endif /* CONFIG_BZIP2 */
>           case 1: /* copy */
>               ret = bdrv_pread(bs->file, s->offsets[chunk],
>                                s->uncompressed_chunk, s->lengths[chunk]);
> diff --git a/configure b/configure
> index cae588c..eadae63 100755
> --- a/configure
> +++ b/configure
> @@ -313,6 +313,7 @@ glx=""
>   zlib="yes"
>   lzo=""
>   snappy=""
> +bzip2=""
>   guest_agent=""
>   guest_agent_with_vss="no"
>   vss_win32_sdk=""
> @@ -1060,6 +1061,10 @@ for opt do
>     ;;
>     --enable-snappy) snappy="yes"
>     ;;
> +  --disable-bzip2) bzip2="no"
> +  ;;
> +  --enable-bzip2) bzip2="yes"
> +  ;;
>     --enable-guest-agent) guest_agent="yes"
>     ;;
>     --disable-guest-agent) guest_agent="no"
> @@ -1374,6 +1379,8 @@ Advanced options (experts only):
>     --enable-usb-redir       enable usb network redirection support
>     --enable-lzo             enable the support of lzo compression library
>     --enable-snappy          enable the support of snappy compression library
> +  --enable-bzip2           enable the support of bzip2 compression library (for
> +                           reading bzip2-compressed dmg images)
>     --disable-guest-agent    disable building of the QEMU Guest Agent
>     --enable-guest-agent     enable building of the QEMU Guest Agent
>     --with-vss-sdk=SDK-path  enable Windows VSS support in QEMU Guest Agent
> @@ -1820,6 +1827,25 @@ EOF
>   fi
>
>   ##########################################
> +# bzip2 check
> +
> +if test "$bzip2" != "no" ; then
> +    cat > $TMPC << EOF
> +#include <bzlib.h>
> +int main(void) { BZ2_bzlibVersion(); return 0; }
> +EOF
> +    if compile_prog "" "-lbz2" ; then
> +        libs_softmmu="$libs_softmmu -lbz2"
> +        bzip2="yes"
> +    else
> +        if test "$bzip2" = "yes"; then
> +            feature_not_found "libbzip2" "Install libbzip2 devel"
> +        fi
> +        bzip2="no"
> +    fi
> +fi
> +
> +##########################################
>   # libseccomp check
>
>   if test "$seccomp" != "no" ; then
> @@ -4340,6 +4366,7 @@ echo "vhdx              $vhdx"
>   echo "Quorum            $quorum"
>   echo "lzo support       $lzo"
>   echo "snappy support    $snappy"
> +echo "bzip2 support     $bzip2"
>   echo "NUMA host support $numa"
>
>   if test "$sdl_too_old" = "yes"; then
> @@ -4695,6 +4722,10 @@ if test "$snappy" = "yes" ; then
>     echo "CONFIG_SNAPPY=y" >> $config_host_mak
>   fi
>
> +if test "$bzip2" = "yes" ; then
> +  echo "CONFIG_BZIP2=y" >> $config_host_mak
> +fi
> +
>   if test "$libiscsi" = "yes" ; then
>     echo "CONFIG_LIBISCSI=m" >> $config_host_mak
>     echo "LIBISCSI_CFLAGS=$libiscsi_cflags" >> $config_host_mak
>

Looks good otherwise. CCing Paolo so he can take a quick peek at the 
configure script. It looks sane to me, though.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 10/10] block/dmg: improve zeroes handling
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 10/10] block/dmg: improve zeroes handling Peter Wu
@ 2015-01-05 19:48   ` John Snow
  2015-01-06  0:21     ` Peter Wu
  0 siblings, 1 reply; 39+ messages in thread
From: John Snow @ 2015-01-05 19:48 UTC (permalink / raw)
  To: Peter Wu, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi



On 12/27/2014 10:01 AM, Peter Wu wrote:
> Disk images may contain large all-zeroes gaps (1.66k sectors or 812 MiB
> is seen in the real world). These blocks (type 2) do not need to be
> extracted into a temporary buffer, there is no need to allocate memory
> for these blocks nor to check its length.
>
> (For the test image, the maximum uncompressed size is 1054371 bytes,
> probably for a bzip2-compressed block.)
>
> Signed-off-by: Peter Wu <peter@lekensteyn.nl>
> ---
>   block/dmg.c | 18 +++++++++++++-----
>   1 file changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/block/dmg.c b/block/dmg.c
> index 67d4e2b..b84ba7e 100644
> --- a/block/dmg.c
> +++ b/block/dmg.c
> @@ -137,7 +137,9 @@ static void update_max_chunk_size(BDRVDMGState *s, uint32_t chunk,
>           uncompressed_sectors = (s->lengths[chunk] + 511) / 512;
>           break;
>       case 2: /* zero */
> -        uncompressed_sectors = s->sectorcounts[chunk];
> +        /* as the all-zeroes block may be large, it is treated specially: the
> +         * sector is not copied from a large buffer, a simple memset is used
> +         * instead. Therefore uncompressed_sectors does not need to be set. */
>           break;
>       }
>
> @@ -260,7 +262,9 @@ static int dmg_read_mish_block(BDRVDMGState *s, DmgHeaderState *ds,
>           s->sectorcounts[i] = buff_read_uint64(buffer, offset);
>           offset += 8;
>
> -        if (s->sectorcounts[i] > DMG_SECTORCOUNTS_MAX) {
> +        /* all-zeroes sector (type 2) does not need to be "uncompressed" and can
> +         * therefore be unbounded. */
> +        if (s->types[i] != 2 && s->sectorcounts[i] > DMG_SECTORCOUNTS_MAX) {
>               error_report("sector count %" PRIu64 " for chunk %" PRIu32
>                            " is larger than max (%u)",
>                            s->sectorcounts[i], i, DMG_SECTORCOUNTS_MAX);
> @@ -621,9 +625,6 @@ static inline int dmg_read_chunk(BlockDriverState *bs, uint64_t sector_num)
>                   return -1;
>               }
>               break;
> -        case 2: /* zero */
> -            memset(s->uncompressed_chunk, 0, 512 * s->sectorcounts[chunk]);
> -            break;

This is just a personal preference, but I might actually prefer a little 
comment here that guards against anyone helpfully re-adding this case 
statement again in the future, if you really want to split it out.

>           }
>           s->current_chunk = chunk;
>       }
> @@ -641,6 +642,13 @@ static int dmg_read(BlockDriverState *bs, int64_t sector_num,
>           if (dmg_read_chunk(bs, sector_num + i) != 0) {
>               return -1;
>           }
> +        /* Special case: current chunk is all zeroes. Do not perform a memcpy as
> +         * s->uncompressed_chunk may be too small to cover the large all-zeroes
> +         * section. */
> +        if (s->types[s->current_chunk] == 2) { /* all zeroes block entry */
> +            memset(buf + i * 512, 0, 512);
> +            continue;
> +        }

Why even call dmg_read_chunk first if we just ignore it? You can 
probably move this special case forward and only do the dmg_read_chunk 
if it fails.

Or maybe we can keep this from becoming too fragmented by giving the 
read_chunk function access to the destination buffer somehow (parameter, 
structure member) and keeping all the logic for re-inflating the chunks 
together at the same layer.

>           sector_offset_in_chunk = sector_num + i - s->sectors[s->current_chunk];
>           memcpy(buf + i * 512,
>                  s->uncompressed_chunk + sector_offset_in_chunk * 512, 512);
>

Looks good, otherwise. Thank you :)

--j

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 10/10] block/dmg: improve zeroes handling
  2015-01-05 19:48   ` John Snow
@ 2015-01-06  0:21     ` Peter Wu
  0 siblings, 0 replies; 39+ messages in thread
From: Peter Wu @ 2015-01-06  0:21 UTC (permalink / raw)
  To: John Snow; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

On Monday 05 January 2015 14:48:03 John Snow wrote:
> On 12/27/2014 10:01 AM, Peter Wu wrote:
> > Disk images may contain large all-zeroes gaps (1.66k sectors or 812 MiB
> > is seen in the real world). These blocks (type 2) do not need to be
> > extracted into a temporary buffer, there is no need to allocate memory
> > for these blocks nor to check its length.
> >
> > (For the test image, the maximum uncompressed size is 1054371 bytes,
> > probably for a bzip2-compressed block.)
> >
> > Signed-off-by: Peter Wu <peter@lekensteyn.nl>
> > ---
> >   block/dmg.c | 18 +++++++++++++-----
> >   1 file changed, 13 insertions(+), 5 deletions(-)
> >
> > diff --git a/block/dmg.c b/block/dmg.c
> > index 67d4e2b..b84ba7e 100644
> > --- a/block/dmg.c
> > +++ b/block/dmg.c
> > @@ -137,7 +137,9 @@ static void update_max_chunk_size(BDRVDMGState *s, uint32_t chunk,
> >           uncompressed_sectors = (s->lengths[chunk] + 511) / 512;
> >           break;
> >       case 2: /* zero */
> > -        uncompressed_sectors = s->sectorcounts[chunk];
> > +        /* as the all-zeroes block may be large, it is treated specially: the
> > +         * sector is not copied from a large buffer, a simple memset is used
> > +         * instead. Therefore uncompressed_sectors does not need to be set. */
> >           break;
> >       }
> >
> > @@ -260,7 +262,9 @@ static int dmg_read_mish_block(BDRVDMGState *s, DmgHeaderState *ds,
> >           s->sectorcounts[i] = buff_read_uint64(buffer, offset);
> >           offset += 8;
> >
> > -        if (s->sectorcounts[i] > DMG_SECTORCOUNTS_MAX) {
> > +        /* all-zeroes sector (type 2) does not need to be "uncompressed" and can
> > +         * therefore be unbounded. */
> > +        if (s->types[i] != 2 && s->sectorcounts[i] > DMG_SECTORCOUNTS_MAX) {
> >               error_report("sector count %" PRIu64 " for chunk %" PRIu32
> >                            " is larger than max (%u)",
> >                            s->sectorcounts[i], i, DMG_SECTORCOUNTS_MAX);
> > @@ -621,9 +625,6 @@ static inline int dmg_read_chunk(BlockDriverState *bs, uint64_t sector_num)
> >                   return -1;
> >               }
> >               break;
> > -        case 2: /* zero */
> > -            memset(s->uncompressed_chunk, 0, 512 * s->sectorcounts[chunk]);
> > -            break;
> 
> This is just a personal preference, but I might actually prefer a little 
> comment here that guards against anyone helpfully re-adding this case 
> statement again in the future, if you really want to split it out.

Ok, makes sense. I'll add an appropriate comment here.

> >           }
> >           s->current_chunk = chunk;
> >       }
> > @@ -641,6 +642,13 @@ static int dmg_read(BlockDriverState *bs, int64_t sector_num,
> >           if (dmg_read_chunk(bs, sector_num + i) != 0) {
> >               return -1;
> >           }
> > +        /* Special case: current chunk is all zeroes. Do not perform a memcpy as
> > +         * s->uncompressed_chunk may be too small to cover the large all-zeroes
> > +         * section. */
> > +        if (s->types[s->current_chunk] == 2) { /* all zeroes block entry */
> > +            memset(buf + i * 512, 0, 512);
> > +            continue;
> > +        }
> 
> Why even call dmg_read_chunk first if we just ignore it? You can 
> probably move this special case forward and only do the dmg_read_chunk 
> if it fails.

dmg_read_chunk is needed to find the current sector (which requires a
binary search if the sector is not within the current chunk). I have
updated the comments to clarify this.

> Or maybe we can keep this from becoming too fragmented by giving the 
> read_chunk function access to the destination buffer somehow (parameter, 
> structure member) and keeping all the logic for re-inflating the chunks 
> together at the same layer.

dmg_read_chunk could be modified to return a pointer inside
s->uncompressed_chunk (adjusted to the sector number), but this just
moves the special case inside dmg_read_chunk. Alternatively, the whole
memcpy/memset stuff could be moved into dmg_read_chunk, but then the
function could be renamed to dmg_read_into_sector.

I treat dmg_read_chunk as a function which reads the current chunk (if
not already read), and returns the cached chunk otherwise. Then the
caller can decide on the part of the chunk it is interested in. As the
zero chunk does not really update the "current chunk contents", it'll
need special treatment.

The "continue" keyword is used while it actually means "either memset or
memcpy", this approach was chosen to avoid re-indenting the below memcpy
code.

Kind regards,
Peter

> >           sector_offset_in_chunk = sector_num + i - s->sectors[s->current_chunk];
> >           memcpy(buf + i * 512,
> >                  s->uncompressed_chunk + sector_offset_in_chunk * 512, 512);
> >
> 
> Looks good, otherwise. Thank you :)
> 
> --j

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 01/10] block/dmg: properly detect the UDIF trailer
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 01/10] block/dmg: properly detect the UDIF trailer Peter Wu
  2015-01-02 23:58   ` John Snow
@ 2015-01-06 13:35   ` Stefan Hajnoczi
  1 sibling, 0 replies; 39+ messages in thread
From: Stefan Hajnoczi @ 2015-01-06 13:35 UTC (permalink / raw)
  To: Peter Wu; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 2373 bytes --]

On Sat, Dec 27, 2014 at 04:01:35PM +0100, Peter Wu wrote:
> diff --git a/block/dmg.c b/block/dmg.c
> index e455886..df274f9 100644
> --- a/block/dmg.c
> +++ b/block/dmg.c
> @@ -131,6 +131,39 @@ static void update_max_chunk_size(BDRVDMGState *s, uint32_t chunk,
>      }
>  }
>  
> +static int64_t dmg_find_koly_offset(BlockDriverState *file_bs)
> +{
> +    int64_t length;
> +    int64_t offset = 0;
> +    uint8_t buffer[515];
> +    int i, ret;
> +
> +    /* bdrv_getlength returns a multiple of block size (512), rounded up. Since
> +     * dmg images can have odd sizes, try to look for the "koly" magic which
> +     * marks the begin of the UDIF trailer (512 bytes). This magic can be found
> +     * in the last 511 bytes of the second-last sector or the first 4 bytes of
> +     * the last sector (search space: 515 bytes) */
> +    length = bdrv_getlength(file_bs);
> +    if (length < 512) {
> +        return length < 0 ? length : -EINVAL;

dmg_open() should pass in Error *errp so a detailed error reporting can
be used:

if (length < 0) {
    error_setg_errno(errp, -length, "Failed to get file size while reading UDIF trailer");
    return length;
} else if (length < 512) {
    error_set(errp, "dmg file must be at least 512 bytes long");
    return -EINVAL;
}

This makes it much easier to pinpoint errors (instead of just -EINVAL)
and also gives the user a hint about the cause.

> +    }
> +    if (length > 511 + 512) {
> +        offset = length - 511 - 512;
> +    }
> +    length = length < 515 ? length : 515;
> +    ret = bdrv_pread(file_bs, offset, buffer, length);
> +    if (ret < 4) {
> +        return ret < 0 ? ret : -EINVAL;

bdrv_pread() does not return short reads.  The return value will either
be length or an error.  This could be just:

if (ret < 0) {
    error_setg_errno(errp, -ret, "Failed to read last sectors in dmg file");
    return ret;
}

(The unique error string makes it easy to track down the location where
the error occurs.)

> +    }
> +    for (i = 0; i < length - 3; i++) {
> +        if (buffer[i] == 'k' && buffer[i+1] == 'o' &&
> +            buffer[i+2] == 'l' && buffer[i+3] == 'y') {
> +            return offset + i;
> +        }
> +    }
> +    return -EINVAL;

error_set(errp, "Not a dmg file, unable to find UDIF footer");
return -EINVAL;

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 02/10] block/dmg: extract mish block decoding functionality
  2014-12-27 15:01 ` [Qemu-devel] [PATCH 02/10] block/dmg: extract mish block decoding functionality Peter Wu
  2015-01-02 23:59   ` John Snow
@ 2015-01-06 13:42   ` Stefan Hajnoczi
  1 sibling, 0 replies; 39+ messages in thread
From: Stefan Hajnoczi @ 2015-01-06 13:42 UTC (permalink / raw)
  To: Peter Wu; +Cc: Kevin Wolf, qemu-devel, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 619 bytes --]

On Sat, Dec 27, 2014 at 04:01:36PM +0100, Peter Wu wrote:
> Extract the mish block decoder such that this can be used for other
> formats in the future. A new DmgHeaderState struct is introduced to
> share state while decoding.
> 
> The code is kept unchanged as much as possible, a "fail" label is added
> for example where a simple return would probably do.
> 
> Signed-off-by: Peter Wu <peter@lekensteyn.nl>
> ---
>  block/dmg.c | 216 +++++++++++++++++++++++++++++++++++-------------------------
>  1 file changed, 125 insertions(+), 91 deletions(-)

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 09/10] block/dmg: support bzip2 block entry types
  2015-01-05 19:32   ` John Snow
@ 2015-01-07 10:29     ` Paolo Bonzini
  2015-01-07 10:31       ` Peter Wu
  0 siblings, 1 reply; 39+ messages in thread
From: Paolo Bonzini @ 2015-01-07 10:29 UTC (permalink / raw)
  To: John Snow, Peter Wu, qemu-devel; +Cc: Kevin Wolf, Stefan Hajnoczi



On 05/01/2015 20:32, John Snow wrote:
>>
>>   fi
>>
>>   ##########################################
>> +# bzip2 check
>> +
>> +if test "$bzip2" != "no" ; then
>> +    cat > $TMPC << EOF
>> +#include <bzlib.h>
>> +int main(void) { BZ2_bzlibVersion(); return 0; }
>> +EOF
>> +    if compile_prog "" "-lbz2" ; then
>> +        libs_softmmu="$libs_softmmu -lbz2"
>> +        bzip2="yes"
>> +    else
>> +        if test "$bzip2" = "yes"; then
>> +            feature_not_found "libbzip2" "Install libbzip2 devel"
>> +        fi
>> +        bzip2="no"
>> +    fi
>> +fi
>> +
>> +##########################################
>>   # libseccomp check
>>
>>   if test "$seccomp" != "no" ; then
>> @@ -4340,6 +4366,7 @@ echo "vhdx              $vhdx"
>>   echo "Quorum            $quorum"
>>   echo "lzo support       $lzo"
>>   echo "snappy support    $snappy"
>> +echo "bzip2 support     $bzip2"
>>   echo "NUMA host support $numa"
>>
>>   if test "$sdl_too_old" = "yes"; then
>> @@ -4695,6 +4722,10 @@ if test "$snappy" = "yes" ; then
>>     echo "CONFIG_SNAPPY=y" >> $config_host_mak
>>   fi
>>
>> +if test "$bzip2" = "yes" ; then
>> +  echo "CONFIG_BZIP2=y" >> $config_host_mak
>> +fi
>> +
>>   if test "$libiscsi" = "yes" ; then
>>     echo "CONFIG_LIBISCSI=m" >> $config_host_mak
>>     echo "LIBISCSI_CFLAGS=$libiscsi_cflags" >> $config_host_mak
>>
> 
> Looks good otherwise. CCing Paolo so he can take a quick peek at the
> configure script. It looks sane to me, though.

It is sane, but instead of libs_softmmu="$libs_softmmu -lbz2" it's
better to follow what we do for libiscsi.  In configure:

CONFIG_BZIP2=y
LIBBZ2_CFLAGS=-lbz2

In Makefile.objs:

dmg.o-libs     := $(LIBBZ2_CFLAGS)

Paolo

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 09/10] block/dmg: support bzip2 block entry types
  2015-01-07 10:29     ` Paolo Bonzini
@ 2015-01-07 10:31       ` Peter Wu
  2015-01-07 10:53         ` Paolo Bonzini
  0 siblings, 1 reply; 39+ messages in thread
From: Peter Wu @ 2015-01-07 10:31 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Kevin Wolf, John Snow, qemu-devel, Stefan Hajnoczi

On Wednesday 07 January 2015 11:29:20 Paolo Bonzini wrote:
> 
> On 05/01/2015 20:32, John Snow wrote:
> >>
> >>   fi
> >>
> >>   ##########################################
> >> +# bzip2 check
> >> +
> >> +if test "$bzip2" != "no" ; then
> >> +    cat > $TMPC << EOF
> >> +#include <bzlib.h>
> >> +int main(void) { BZ2_bzlibVersion(); return 0; }
> >> +EOF
> >> +    if compile_prog "" "-lbz2" ; then
> >> +        libs_softmmu="$libs_softmmu -lbz2"
> >> +        bzip2="yes"
> >> +    else
> >> +        if test "$bzip2" = "yes"; then
> >> +            feature_not_found "libbzip2" "Install libbzip2 devel"
> >> +        fi
> >> +        bzip2="no"
> >> +    fi
> >> +fi
> >> +
> >> +##########################################
> >>   # libseccomp check
> >>
> >>   if test "$seccomp" != "no" ; then
> >> @@ -4340,6 +4366,7 @@ echo "vhdx              $vhdx"
> >>   echo "Quorum            $quorum"
> >>   echo "lzo support       $lzo"
> >>   echo "snappy support    $snappy"
> >> +echo "bzip2 support     $bzip2"
> >>   echo "NUMA host support $numa"
> >>
> >>   if test "$sdl_too_old" = "yes"; then
> >> @@ -4695,6 +4722,10 @@ if test "$snappy" = "yes" ; then
> >>     echo "CONFIG_SNAPPY=y" >> $config_host_mak
> >>   fi
> >>
> >> +if test "$bzip2" = "yes" ; then
> >> +  echo "CONFIG_BZIP2=y" >> $config_host_mak
> >> +fi
> >> +
> >>   if test "$libiscsi" = "yes" ; then
> >>     echo "CONFIG_LIBISCSI=m" >> $config_host_mak
> >>     echo "LIBISCSI_CFLAGS=$libiscsi_cflags" >> $config_host_mak
> >>
> > 
> > Looks good otherwise. CCing Paolo so he can take a quick peek at the
> > configure script. It looks sane to me, though.
> 
> It is sane, but instead of libs_softmmu="$libs_softmmu -lbz2" it's
> better to follow what we do for libiscsi.  In configure:
> 
> CONFIG_BZIP2=y
> LIBBZ2_CFLAGS=-lbz2
> 
> In Makefile.objs:
> 
> dmg.o-libs     := $(LIBBZ2_CFLAGS)
> 
> Paolo

Already taken care of in v2 which was posted to the list yesterday :-)

I named it "BZIP2_LIBS" though (matching the package name).
-- 
Kind regards,
Peter
https://lekensteyn.nl

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [Qemu-devel] [PATCH 09/10] block/dmg: support bzip2 block entry types
  2015-01-07 10:31       ` Peter Wu
@ 2015-01-07 10:53         ` Paolo Bonzini
  0 siblings, 0 replies; 39+ messages in thread
From: Paolo Bonzini @ 2015-01-07 10:53 UTC (permalink / raw)
  To: Peter Wu; +Cc: Kevin Wolf, John Snow, qemu-devel, Stefan Hajnoczi



On 07/01/2015 11:31, Peter Wu wrote:
> Already taken care of in v2 which was posted to the list yesterday :-)
> 
> I named it "BZIP2_LIBS" though (matching the package name).

Great. :)

Paolo

^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2015-01-07 14:33 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-27 15:01 [Qemu-devel] [PATCH 00/10] block/dmg: (compatibility) fixes and bzip2 support Peter Wu
2014-12-27 15:01 ` [Qemu-devel] [PATCH 01/10] block/dmg: properly detect the UDIF trailer Peter Wu
2015-01-02 23:58   ` John Snow
2015-01-03  9:39     ` Peter Wu
2015-01-06 13:35   ` Stefan Hajnoczi
2014-12-27 15:01 ` [Qemu-devel] [PATCH 02/10] block/dmg: extract mish block decoding functionality Peter Wu
2015-01-02 23:59   ` John Snow
2015-01-03 11:05     ` Peter Wu
2015-01-06 13:42   ` Stefan Hajnoczi
2014-12-27 15:01 ` [Qemu-devel] [PATCH 03/10] block/dmg: extract processing of resource forks Peter Wu
2015-01-03  0:01   ` John Snow
2015-01-03 11:24     ` Peter Wu
2014-12-27 15:01 ` [Qemu-devel] [PATCH 04/10] block/dmg: process a buffer instead of reading ints Peter Wu
2015-01-03  0:01   ` John Snow
2014-12-27 15:01 ` [Qemu-devel] [PATCH 05/10] block/dmg: validate chunk size to avoid overflow Peter Wu
2015-01-03  0:02   ` John Snow
2014-12-27 15:01 ` [Qemu-devel] [PATCH 06/10] block/dmg: process XML plists Peter Wu
2015-01-03  0:04   ` John Snow
2015-01-03 11:54     ` Peter Wu
2015-01-05 16:46       ` John Snow
2015-01-05 16:54   ` John Snow
2014-12-27 15:01 ` [Qemu-devel] [PATCH 07/10] block/dmg: set virtual size to a non-zero value Peter Wu
2015-01-03  0:04   ` John Snow
2014-12-27 15:01 ` [Qemu-devel] [PATCH 08/10] block/dmg: fix sector data offset calculation Peter Wu
2015-01-03  0:05   ` John Snow
2015-01-03 12:47     ` Peter Wu
2014-12-27 15:01 ` [Qemu-devel] [PATCH 09/10] block/dmg: support bzip2 block entry types Peter Wu
2015-01-05 19:32   ` John Snow
2015-01-07 10:29     ` Paolo Bonzini
2015-01-07 10:31       ` Peter Wu
2015-01-07 10:53         ` Paolo Bonzini
2014-12-27 15:01 ` [Qemu-devel] [PATCH 10/10] block/dmg: improve zeroes handling Peter Wu
2015-01-05 19:48   ` John Snow
2015-01-06  0:21     ` Peter Wu
2015-01-02 14:14 ` [Qemu-devel] [PATCH 00/10] block/dmg: (compatibility) fixes and bzip2 support Stefan Hajnoczi
2015-01-02 16:31   ` John Snow
2015-01-02 18:46     ` Peter Wu
2015-01-02 18:58       ` John Snow
2015-01-02 21:49         ` Peter Wu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.