All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata
@ 2013-07-16 16:29 Paolo Bonzini
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 01/17] cow: make reads go at a decent speed Paolo Bonzini
                   ` (20 more replies)
  0 siblings, 21 replies; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-16 16:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: pl, famz, stefanha

This series adds a subcommand to "map" that can dump file metadata.
Metadata that is dumped includes:

- whether blocks are allocated in bs->file and, if so, where

- whether blocks are zero

- whether data is read from bs or bs->backing_hd

Metadata is dumped for an entire chain of images.  One example usage is
(for non-compressed, non-encrypted images) to transform the metadata
into a Linux device-mapper setup, and make a qcow2 image available (for
read only) as a block device.  Another possible usage is to determine
the used areas of a file, and convert it in place to another format.
Alternatively, the richer data can be used by block jobs to quickly
determine if a block is zero without reading it.

The patches implement a new operation, bdrv_co_get_block_status, that
supersedes bdrv_co_is_allocated.  The bdrv_is_allocated API is still
available of course, and implemented on top of the new operation.

Patches 1-3 touch cow, which uses bdrv_co_is_allocated during its reads
and writes.  I optimized it a bit because it was unbearably slow during
testing.  With these patches (also tested with blkverify), booting of
a cow guest image is not particularly slow.

Patches 4 to 6 clean up the bdrv_is_allocated and bdrv_is_allocated_above
implementation, eliminating some code duplication.

Patches 7 and 8 tweak bdrv_has_zero_init and its usage in qemu-img in
a way that helps when implementing the new API.

Patches 9 to 11 implement bdrv_get_block_status and change the formats
to report all the available information.

Patch 12 adds the "map" subcommand to qemu-img.

Finally, patches 13 to 17 tweak the implementation to extract more
information from protocols, and combine this information with format
metadata whenever possible.

Paolo

v1->v2: see patches 1, 6, 9, 10, 11, 12

Paolo Bonzini (17):
  cow: make reads go at a decent speed
  cow: make writes go at a less indecent speed
  cow: do not call bdrv_co_is_allocated
  block: make bdrv_co_is_allocated static
  block: remove bdrv_is_allocated_above/bdrv_co_is_allocated_above
    distinction
  block: expect errors from bdrv_co_is_allocated
  qemu-img: always probe the input image for allocated sectors
  block: make bdrv_has_zero_init return false for copy-on-write-images
  block: introduce bdrv_get_block_status API
  block: define get_block_status return value
  block: return get_block_status data and flags for formats
  qemu-img: add a "map" subcommand
  block: use bdrv_has_zero_init to return BDRV_BLOCK_ZERO
  raw-posix: return get_block_status data and flags
  raw-posix: detect XFS unwritten extents
  block: add default get_block_status implementation for protocols
  block: look for zero blocks in bs->file

 block.c                   | 134 +++++++++++++--------------
 block/commit.c            |   6 +-
 block/cow.c               |  90 +++++++++++++------
 block/mirror.c            |   4 +-
 block/qcow.c              |  13 ++-
 block/qcow2.c             |  24 +++--
 block/qed.c               |  39 ++++++--
 block/raw-posix.c         |  24 +++--
 block/raw.c               |   6 +-
 block/sheepdog.c          |  14 +--
 block/stream.c            |  10 +--
 block/vdi.c               |  17 +++-
 block/vmdk.c              |  23 ++++-
 block/vvfat.c             |  15 ++--
 include/block/block.h     |  34 +++++--
 include/block/block_int.h |   2 +-
 qemu-img-cmds.hx          |   6 ++
 qemu-img.c                | 225 +++++++++++++++++++++++++++++++++++++++++-----
 18 files changed, 503 insertions(+), 183 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v2 01/17] cow: make reads go at a decent speed
  2013-07-16 16:29 [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata Paolo Bonzini
@ 2013-07-16 16:29 ` Paolo Bonzini
  2013-07-19 12:31   ` Eric Blake
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 02/17] cow: make writes go at a less indecent speed Paolo Bonzini
                   ` (19 subsequent siblings)
  20 siblings, 1 reply; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-16 16:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: pl, famz, stefanha

Do not do two reads for each sector; load each sector of the bitmap
and use bitmap operations to process it.

Writes are still dog slow!

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: use BDRV_SECTOR_SIZE, not 512 for bitmap array length

 block/cow.c | 54 ++++++++++++++++++++++++++++++++----------------------
 1 file changed, 32 insertions(+), 22 deletions(-)

diff --git a/block/cow.c b/block/cow.c
index 1cc2e89..fa3d41f 100644
--- a/block/cow.c
+++ b/block/cow.c
@@ -126,18 +126,31 @@ static inline int cow_set_bit(BlockDriverState *bs, int64_t bitnum)
     return 0;
 }
 
-static inline int is_bit_set(BlockDriverState *bs, int64_t bitnum)
+#define BITS_PER_BITMAP_SECTOR (512 * 8)
+
+/* Cannot use bitmap.c on big-endian machines.  */
+static int cow_test_bit(int64_t bitnum, const uint8_t *bitmap)
 {
-    uint64_t offset = sizeof(struct cow_header_v2) + bitnum / 8;
-    uint8_t bitmap;
-    int ret;
+    return (bitmap[bitnum / 8] & (1 << (bitnum & 7))) != 0;
+}
 
-    ret = bdrv_pread(bs->file, offset, &bitmap, sizeof(bitmap));
-    if (ret < 0) {
-       return ret;
+static int cow_find_streak(const uint8_t *bitmap, int value, int start, int nb_sectors)
+{
+    int streak_value = value ? 0xFF : 0;
+    int last = MIN(start + nb_sectors, BITS_PER_BITMAP_SECTOR);
+    int bitnum = start;
+    while (bitnum < last) {
+        if ((bitnum & 7) == 0 && bitmap[bitnum / 8] == streak_value) {
+            bitnum += 8;
+            continue;
+        }
+        if (cow_test_bit(bitnum, bitmap) == value) {
+            bitnum++;
+            continue;
+        }
+        break;
     }
-
-    return !!(bitmap & (1 << (bitnum % 8)));
+    return MIN(bitnum, last) - start;
 }
 
 /* Return true if first block has been changed (ie. current version is
@@ -146,23 +159,20 @@ static inline int is_bit_set(BlockDriverState *bs, int64_t bitnum)
 static int coroutine_fn cow_co_is_allocated(BlockDriverState *bs,
         int64_t sector_num, int nb_sectors, int *num_same)
 {
+    int64_t bitnum = sector_num + sizeof(struct cow_header_v2) * 8;
+    uint64_t offset = (bitnum / 8) & -BDRV_SECTOR_SIZE;
+    uint8_t bitmap[BDRV_SECTOR_SIZE];
+    int ret;
     int changed;
 
-    if (nb_sectors == 0) {
-	*num_same = nb_sectors;
-	return 0;
-    }
-
-    changed = is_bit_set(bs, sector_num);
-    if (changed < 0) {
-        return 0; /* XXX: how to return I/O errors? */
-    }
-
-    for (*num_same = 1; *num_same < nb_sectors; (*num_same)++) {
-	if (is_bit_set(bs, sector_num + *num_same) != changed)
-	    break;
+    ret = bdrv_pread(bs->file, offset, &bitmap, sizeof(bitmap));
+    if (ret < 0) {
+        return ret;
     }
 
+    bitnum &= BITS_PER_BITMAP_SECTOR - 1;
+    changed = cow_test_bit(bitnum, bitmap);
+    *num_same = cow_find_streak(bitmap, changed, bitnum, nb_sectors);
     return changed;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v2 02/17] cow: make writes go at a less indecent speed
  2013-07-16 16:29 [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata Paolo Bonzini
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 01/17] cow: make reads go at a decent speed Paolo Bonzini
@ 2013-07-16 16:29 ` Paolo Bonzini
  2013-07-19 12:40   ` Eric Blake
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 03/17] cow: do not call bdrv_co_is_allocated Paolo Bonzini
                   ` (18 subsequent siblings)
  20 siblings, 1 reply; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-16 16:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: pl, famz, stefanha

Only sync once per write, rather than once per sector.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/cow.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/block/cow.c b/block/cow.c
index fa3d41f..304e4c7 100644
--- a/block/cow.c
+++ b/block/cow.c
@@ -106,7 +106,7 @@ static int cow_open(BlockDriverState *bs, QDict *options, int flags)
  * XXX(hch): right now these functions are extremely inefficient.
  * We should just read the whole bitmap we'll need in one go instead.
  */
-static inline int cow_set_bit(BlockDriverState *bs, int64_t bitnum)
+static inline int cow_set_bit(BlockDriverState *bs, int64_t bitnum, bool *first)
 {
     uint64_t offset = sizeof(struct cow_header_v2) + bitnum / 8;
     uint8_t bitmap;
@@ -117,9 +117,21 @@ static inline int cow_set_bit(BlockDriverState *bs, int64_t bitnum)
        return ret;
     }
 
+    if (bitmap & (1 << (bitnum % 8))) {
+        return 0;
+    }
+
+    if (*first) {
+        ret = bdrv_flush(bs->file);
+        if (ret < 0) {
+           return ret;
+        }
+        *first = false;
+    }
+
     bitmap |= (1 << (bitnum % 8));
 
-    ret = bdrv_pwrite_sync(bs->file, offset, &bitmap, sizeof(bitmap));
+    ret = bdrv_pwrite(bs->file, offset, &bitmap, sizeof(bitmap));
     if (ret < 0) {
        return ret;
     }
@@ -181,9 +193,10 @@ static int cow_update_bitmap(BlockDriverState *bs, int64_t sector_num,
 {
     int error = 0;
     int i;
+    bool first = true;
 
     for (i = 0; i < nb_sectors; i++) {
-        error = cow_set_bit(bs, sector_num + i);
+        error = cow_set_bit(bs, sector_num + i, &first);
         if (error) {
             break;
         }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v2 03/17] cow: do not call bdrv_co_is_allocated
  2013-07-16 16:29 [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata Paolo Bonzini
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 01/17] cow: make reads go at a decent speed Paolo Bonzini
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 02/17] cow: make writes go at a less indecent speed Paolo Bonzini
@ 2013-07-16 16:29 ` Paolo Bonzini
  2013-07-19 12:42   ` Eric Blake
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 04/17] block: make bdrv_co_is_allocated static Paolo Bonzini
                   ` (17 subsequent siblings)
  20 siblings, 1 reply; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-16 16:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: pl, famz, stefanha

As we change bdrv_is_allocated to gather more information from bs and
bs->file, it will become a bit slower.  It is still appropriate for online
jobs, but not for reads/writes.  Call the internal function instead.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/cow.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/block/cow.c b/block/cow.c
index 304e4c7..2994f8d 100644
--- a/block/cow.c
+++ b/block/cow.c
@@ -46,6 +46,9 @@ typedef struct BDRVCowState {
     int64_t cow_sectors_offset;
 } BDRVCowState;
 
+static int coroutine_fn cow_co_is_allocated(BlockDriverState *bs,
+        int64_t sector_num, int nb_sectors, int *num_same);
+
 static int cow_probe(const uint8_t *buf, int buf_size, const char *filename)
 {
     const struct cow_header_v2 *cow_header = (const void *)buf;
@@ -212,7 +215,7 @@ static int coroutine_fn cow_read(BlockDriverState *bs, int64_t sector_num,
     int ret, n;
 
     while (nb_sectors > 0) {
-        if (bdrv_co_is_allocated(bs, sector_num, nb_sectors, &n)) {
+        if (cow_co_is_allocated(bs, sector_num, nb_sectors, &n)) {
             ret = bdrv_pread(bs->file,
                         s->cow_sectors_offset + sector_num * 512,
                         buf, n * 512);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v2 04/17] block: make bdrv_co_is_allocated static
  2013-07-16 16:29 [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata Paolo Bonzini
                   ` (2 preceding siblings ...)
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 03/17] cow: do not call bdrv_co_is_allocated Paolo Bonzini
@ 2013-07-16 16:29 ` Paolo Bonzini
  2013-07-19 12:44   ` Eric Blake
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 05/17] block: remove bdrv_is_allocated_above/bdrv_co_is_allocated_above distinction Paolo Bonzini
                   ` (16 subsequent siblings)
  20 siblings, 1 reply; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-16 16:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: pl, famz, stefanha

bdrv_is_allocated can detect coroutine context and go through a fast
path, similar to other block layer functions.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c               | 24 +++++++++++++++---------
 block/raw.c           |  2 +-
 block/stream.c        |  4 ++--
 include/block/block.h |  2 --
 4 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/block.c b/block.c
index b560241..37e1834 100644
--- a/block.c
+++ b/block.c
@@ -2516,7 +2516,7 @@ static int coroutine_fn bdrv_co_do_readv(BlockDriverState *bs,
     if (flags & BDRV_REQ_COPY_ON_READ) {
         int pnum;
 
-        ret = bdrv_co_is_allocated(bs, sector_num, nb_sectors, &pnum);
+        ret = bdrv_is_allocated(bs, sector_num, nb_sectors, &pnum);
         if (ret < 0) {
             goto out;
         }
@@ -2967,8 +2967,9 @@ typedef struct BdrvCoIsAllocatedData {
  * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
  * beyond the end of the disk image it will be clamped.
  */
-int coroutine_fn bdrv_co_is_allocated(BlockDriverState *bs, int64_t sector_num,
-                                      int nb_sectors, int *pnum)
+static int coroutine_fn bdrv_co_is_allocated(BlockDriverState *bs,
+                                             int64_t sector_num,
+                                             int nb_sectors, int *pnum)
 {
     int64_t n;
 
@@ -3018,10 +3019,15 @@ int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
         .done = false,
     };
 
-    co = qemu_coroutine_create(bdrv_is_allocated_co_entry);
-    qemu_coroutine_enter(co, &data);
-    while (!data.done) {
-        qemu_aio_wait();
+    if (qemu_in_coroutine()) {
+        /* Fast-path if already in coroutine context */
+        bdrv_is_allocated_co_entry(&data);
+    } else {
+        co = qemu_coroutine_create(bdrv_is_allocated_co_entry);
+        qemu_coroutine_enter(co, &data);
+        while (!data.done) {
+            qemu_aio_wait();
+        }
     }
     return data.ret;
 }
@@ -3049,8 +3055,8 @@ int coroutine_fn bdrv_co_is_allocated_above(BlockDriverState *top,
     intermediate = top;
     while (intermediate && intermediate != base) {
         int pnum_inter;
-        ret = bdrv_co_is_allocated(intermediate, sector_num, nb_sectors,
-                                   &pnum_inter);
+        ret = bdrv_is_allocated(intermediate, sector_num, nb_sectors,
+                                &pnum_inter);
         if (ret < 0) {
             return ret;
         } else if (ret) {
diff --git a/block/raw.c b/block/raw.c
index ce10422..df4a868 100644
--- a/block/raw.c
+++ b/block/raw.c
@@ -39,7 +39,7 @@ static int coroutine_fn raw_co_is_allocated(BlockDriverState *bs,
                                             int64_t sector_num,
                                             int nb_sectors, int *pnum)
 {
-    return bdrv_co_is_allocated(bs->file, sector_num, nb_sectors, pnum);
+    return bdrv_is_allocated(bs->file, sector_num, nb_sectors, pnum);
 }
 
 static int64_t raw_getlength(BlockDriverState *bs)
diff --git a/block/stream.c b/block/stream.c
index 7fe9e48..fb1f9c3 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -115,8 +115,8 @@ wait:
             break;
         }
 
-        ret = bdrv_co_is_allocated(bs, sector_num,
-                                   STREAM_BUFFER_SIZE / BDRV_SECTOR_SIZE, &n);
+        ret = bdrv_is_allocated(bs, sector_num,
+                                STREAM_BUFFER_SIZE / BDRV_SECTOR_SIZE, &n);
         if (ret == 1) {
             /* Allocated in the top, no need to copy.  */
             copy = false;
diff --git a/include/block/block.h b/include/block/block.h
index b6b9014..7062a3d 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -179,8 +179,6 @@ int coroutine_fn bdrv_co_writev(BlockDriverState *bs, int64_t sector_num,
  */
 int coroutine_fn bdrv_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
     int nb_sectors);
-int coroutine_fn bdrv_co_is_allocated(BlockDriverState *bs, int64_t sector_num,
-    int nb_sectors, int *pnum);
 int coroutine_fn bdrv_co_is_allocated_above(BlockDriverState *top,
                                             BlockDriverState *base,
                                             int64_t sector_num,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v2 05/17] block: remove bdrv_is_allocated_above/bdrv_co_is_allocated_above distinction
  2013-07-16 16:29 [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata Paolo Bonzini
                   ` (3 preceding siblings ...)
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 04/17] block: make bdrv_co_is_allocated static Paolo Bonzini
@ 2013-07-16 16:29 ` Paolo Bonzini
  2013-07-19 12:46   ` Eric Blake
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 06/17] block: expect errors from bdrv_co_is_allocated Paolo Bonzini
                   ` (15 subsequent siblings)
  20 siblings, 1 reply; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-16 16:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: pl, famz, stefanha

Now that bdrv_is_allocated detects coroutine context, the two can
use the same code.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c               | 46 ++++------------------------------------------
 block/commit.c        |  6 +++---
 block/mirror.c        |  4 ++--
 block/stream.c        |  4 ++--
 include/block/block.h |  4 ----
 5 files changed, 11 insertions(+), 53 deletions(-)

diff --git a/block.c b/block.c
index 37e1834..ae87cba 100644
--- a/block.c
+++ b/block.c
@@ -3044,10 +3044,10 @@ int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
  *  allocated/unallocated state.
  *
  */
-int coroutine_fn bdrv_co_is_allocated_above(BlockDriverState *top,
-                                            BlockDriverState *base,
-                                            int64_t sector_num,
-                                            int nb_sectors, int *pnum)
+int coroutine_fn bdrv_is_allocated_above(BlockDriverState *top,
+                                         BlockDriverState *base,
+                                         int64_t sector_num,
+                                         int nb_sectors, int *pnum)
 {
     BlockDriverState *intermediate;
     int ret, n = nb_sectors;
@@ -3083,44 +3083,6 @@ int coroutine_fn bdrv_co_is_allocated_above(BlockDriverState *top,
     return 0;
 }
 
-/* Coroutine wrapper for bdrv_is_allocated_above() */
-static void coroutine_fn bdrv_is_allocated_above_co_entry(void *opaque)
-{
-    BdrvCoIsAllocatedData *data = opaque;
-    BlockDriverState *top = data->bs;
-    BlockDriverState *base = data->base;
-
-    data->ret = bdrv_co_is_allocated_above(top, base, data->sector_num,
-                                           data->nb_sectors, data->pnum);
-    data->done = true;
-}
-
-/*
- * Synchronous wrapper around bdrv_co_is_allocated_above().
- *
- * See bdrv_co_is_allocated_above() for details.
- */
-int bdrv_is_allocated_above(BlockDriverState *top, BlockDriverState *base,
-                            int64_t sector_num, int nb_sectors, int *pnum)
-{
-    Coroutine *co;
-    BdrvCoIsAllocatedData data = {
-        .bs = top,
-        .base = base,
-        .sector_num = sector_num,
-        .nb_sectors = nb_sectors,
-        .pnum = pnum,
-        .done = false,
-    };
-
-    co = qemu_coroutine_create(bdrv_is_allocated_above_co_entry);
-    qemu_coroutine_enter(co, &data);
-    while (!data.done) {
-        qemu_aio_wait();
-    }
-    return data.ret;
-}
-
 const char *bdrv_get_encrypted_filename(BlockDriverState *bs)
 {
     if (bs->backing_hd && bs->backing_hd->encrypted)
diff --git a/block/commit.c b/block/commit.c
index 2227fc2..37572f0 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -108,9 +108,9 @@ wait:
             break;
         }
         /* Copy if allocated above the base */
-        ret = bdrv_co_is_allocated_above(top, base, sector_num,
-                                         COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
-                                         &n);
+        ret = bdrv_is_allocated_above(top, base, sector_num,
+                                      COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
+                                      &n);
         copy = (ret == 1);
         trace_commit_one_iteration(s, sector_num, n, ret);
         if (copy) {
diff --git a/block/mirror.c b/block/mirror.c
index bed4a7e..0b3c2f6 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -338,8 +338,8 @@ static void coroutine_fn mirror_run(void *opaque)
         base = s->mode == MIRROR_SYNC_MODE_FULL ? NULL : bs->backing_hd;
         for (sector_num = 0; sector_num < end; ) {
             int64_t next = (sector_num | (sectors_per_chunk - 1)) + 1;
-            ret = bdrv_co_is_allocated_above(bs, base,
-                                             sector_num, next - sector_num, &n);
+            ret = bdrv_is_allocated_above(bs, base,
+                                          sector_num, next - sector_num, &n);
 
             if (ret < 0) {
                 goto immediate_exit;
diff --git a/block/stream.c b/block/stream.c
index fb1f9c3..802acae 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -123,8 +123,8 @@ wait:
         } else {
             /* Copy if allocated in the intermediate images.  Limit to the
              * known-unallocated area [sector_num, sector_num+n).  */
-            ret = bdrv_co_is_allocated_above(bs->backing_hd, base,
-                                             sector_num, n, &n);
+            ret = bdrv_is_allocated_above(bs->backing_hd, base,
+	                                   sector_num, n, &n);
 
             /* Finish early if end of backing file has been reached */
             if (ret == 0 && n == 0) {
diff --git a/include/block/block.h b/include/block/block.h
index 7062a3d..b567b7b 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -179,10 +179,6 @@ int coroutine_fn bdrv_co_writev(BlockDriverState *bs, int64_t sector_num,
  */
 int coroutine_fn bdrv_co_write_zeroes(BlockDriverState *bs, int64_t sector_num,
     int nb_sectors);
-int coroutine_fn bdrv_co_is_allocated_above(BlockDriverState *top,
-                                            BlockDriverState *base,
-                                            int64_t sector_num,
-                                            int nb_sectors, int *pnum);
 BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
     const char *backing_file);
 int bdrv_get_backing_file_depth(BlockDriverState *bs);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v2 06/17] block: expect errors from bdrv_co_is_allocated
  2013-07-16 16:29 [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata Paolo Bonzini
                   ` (4 preceding siblings ...)
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 05/17] block: remove bdrv_is_allocated_above/bdrv_co_is_allocated_above distinction Paolo Bonzini
@ 2013-07-16 16:29 ` Paolo Bonzini
  2013-07-19 12:57   ` Eric Blake
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 07/17] qemu-img: always probe the input image for allocated sectors Paolo Bonzini
                   ` (14 subsequent siblings)
  20 siblings, 1 reply; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-16 16:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: pl, famz, qemu-stable, stefanha

Some bdrv_is_allocated callers do not expect errors, but the fallback
in qcow2.c might make other callers trip on assertion failures or
infinite loops.

Fix the callers to always look for errors.

Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: modify error message, add strerror(-ret)

 block/qcow2.c  | 4 +---
 block/stream.c | 2 +-
 qemu-img.c     | 5 +++++
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 0eceefe..e2b4202 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -648,13 +648,11 @@ static int coroutine_fn qcow2_co_is_allocated(BlockDriverState *bs,
     int ret;
 
     *pnum = nb_sectors;
-    /* FIXME We can get errors here, but the bdrv_co_is_allocated interface
-     * can't pass them on today */
     qemu_co_mutex_lock(&s->lock);
     ret = qcow2_get_cluster_offset(bs, sector_num << 9, pnum, &cluster_offset);
     qemu_co_mutex_unlock(&s->lock);
     if (ret < 0) {
-        *pnum = 0;
+        return ret;
     }
 
     return (cluster_offset != 0) || (ret == QCOW2_CLUSTER_ZERO);
diff --git a/block/stream.c b/block/stream.c
index 802acae..28b702d 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -120,7 +120,7 @@ wait:
         if (ret == 1) {
             /* Allocated in the top, no need to copy.  */
             copy = false;
-        } else {
+        } else if (ret >= 0) {
             /* Copy if allocated in the intermediate images.  Limit to the
              * known-unallocated area [sector_num, sector_num+n).  */
             ret = bdrv_is_allocated_above(bs->backing_hd, base,
diff --git a/qemu-img.c b/qemu-img.c
index c55ca5c..a4957eb 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2073,6 +2073,11 @@ static int img_rebase(int argc, char **argv)
 
             /* If the cluster is allocated, we don't need to take action */
             ret = bdrv_is_allocated(bs, sector, n, &n);
+            if (ret < 0) {
+                error_report("error while reading image metadata: %s",
+                             strerror(-ret));
+                goto out;
+            }
             if (ret) {
                 continue;
             }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v2 07/17] qemu-img: always probe the input image for allocated sectors
  2013-07-16 16:29 [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata Paolo Bonzini
                   ` (5 preceding siblings ...)
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 06/17] block: expect errors from bdrv_co_is_allocated Paolo Bonzini
@ 2013-07-16 16:29 ` Paolo Bonzini
  2013-07-19 13:05   ` Eric Blake
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 08/17] block: make bdrv_has_zero_init return false for copy-on-write-images Paolo Bonzini
                   ` (13 subsequent siblings)
  20 siblings, 1 reply; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-16 16:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: pl, famz, stefanha

qemu-img convert is assuming "that sectors which are unallocated in the input
image are present in both the output's and input's base images", but it is
only doing this if the output image is zero initialized.  And checking if
the output image is zero initialized does not make much sense if the
output image is copy-on-write.  Always do the test.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 qemu-img.c | 26 ++++++++++++--------------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index a4957eb..3faffee 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1476,21 +1476,19 @@ static int img_convert(int argc, char **argv)
                 n = bs_offset + bs_sectors - sector_num;
             }
 
-            if (has_zero_init) {
-                /* If the output image is being created as a copy on write image,
-                   assume that sectors which are unallocated in the input image
-                   are present in both the output's and input's base images (no
-                   need to copy them). */
-                if (out_baseimg) {
-                    if (!bdrv_is_allocated(bs[bs_i], sector_num - bs_offset,
-                                           n, &n1)) {
-                        sector_num += n1;
-                        continue;
-                    }
-                    /* The next 'n1' sectors are allocated in the input image. Copy
-                       only those as they may be followed by unallocated sectors. */
-                    n = n1;
+            /* If the output image is being created as a copy on write image,
+               assume that sectors which are unallocated in the input image
+               are present in both the output's and input's base images (no
+               need to copy them). */
+            if (out_baseimg) {
+                if (!bdrv_is_allocated(bs[bs_i], sector_num - bs_offset,
+                                       n, &n1)) {
+                    sector_num += n1;
+                    continue;
                 }
+                /* The next 'n1' sectors are allocated in the input image. Copy
+                   only those as they may be followed by unallocated sectors. */
+                n = n1;
             } else {
                 n1 = n;
             }
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v2 08/17] block: make bdrv_has_zero_init return false for copy-on-write-images
  2013-07-16 16:29 [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata Paolo Bonzini
                   ` (6 preceding siblings ...)
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 07/17] qemu-img: always probe the input image for allocated sectors Paolo Bonzini
@ 2013-07-16 16:29 ` Paolo Bonzini
  2013-07-19 13:21   ` Eric Blake
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 09/17] block: introduce bdrv_get_block_status API Paolo Bonzini
                   ` (12 subsequent siblings)
  20 siblings, 1 reply; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-16 16:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: pl, famz, stefanha

This helps implementing is_allocated on top of get_block_status.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c    | 5 +++++
 qemu-img.c | 9 +--------
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/block.c b/block.c
index ae87cba..0fb409b 100644
--- a/block.c
+++ b/block.c
@@ -2934,6 +2934,11 @@ int bdrv_has_zero_init(BlockDriverState *bs)
 {
     assert(bs->drv);
 
+    /* If BS is a copy on write image, it is initialized to
+       the contents of the base image, which may not be zeroes.  */
+    if (bs->backing_hd) {
+        return 0;
+    }
     if (bs->drv->bdrv_has_zero_init) {
         return bs->drv->bdrv_has_zero_init(bs);
     }
diff --git a/qemu-img.c b/qemu-img.c
index 3faffee..c5c8ebc 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1504,14 +1504,7 @@ static int img_convert(int argc, char **argv)
                should add a specific call to have the info to go faster */
             buf1 = buf;
             while (n > 0) {
-                /* If the output image is being created as a copy on write image,
-                   copy all sectors even the ones containing only NUL bytes,
-                   because they may differ from the sectors in the base image.
-
-                   If the output is to a host device, we also write out
-                   sectors that are entirely 0, since whatever data was
-                   already there is garbage, not 0s. */
-                if (!has_zero_init || out_baseimg ||
+                if (!has_zero_init ||
                     is_allocated_sectors_min(buf1, n, &n1, min_sparse)) {
                     ret = bdrv_write(out_bs, sector_num, buf1, n1);
                     if (ret < 0) {
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v2 09/17] block: introduce bdrv_get_block_status API
  2013-07-16 16:29 [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata Paolo Bonzini
                   ` (7 preceding siblings ...)
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 08/17] block: make bdrv_has_zero_init return false for copy-on-write-images Paolo Bonzini
@ 2013-07-16 16:29 ` Paolo Bonzini
  2013-07-19 13:43   ` Eric Blake
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 10/17] block: define get_block_status return value Paolo Bonzini
                   ` (11 subsequent siblings)
  20 siblings, 1 reply; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-16 16:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: pl, famz, stefanha

For now, bdrv_get_block_status is just another name for bdrv_is_allocated.
The next patches will add more flags.

This also touches all block drivers with a mostly mechanical rename.  The
sole exception is cow; because it calls cow_co_is_allocated from the read
code, we keep that function and make cow_co_get_block_status a wrapper.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: rebase after vmdk changes

 block.c                   | 46 ++++++++++++++++++++++++++--------------------
 block/cow.c               |  8 +++++++-
 block/qcow.c              |  4 ++--
 block/qcow2.c             |  4 ++--
 block/qed.c               |  4 ++--
 block/raw-posix.c         |  4 ++--
 block/raw.c               |  6 +++---
 block/sheepdog.c          | 12 ++++++------
 block/vdi.c               |  4 ++--
 block/vmdk.c              |  4 ++--
 block/vvfat.c             |  4 ++--
 include/block/block.h     |  2 ++
 include/block/block_int.h |  2 +-
 13 files changed, 59 insertions(+), 45 deletions(-)

diff --git a/block.c b/block.c
index 0fb409b..6e7a8a3 100644
--- a/block.c
+++ b/block.c
@@ -2947,15 +2947,15 @@ int bdrv_has_zero_init(BlockDriverState *bs)
     return 0;
 }
 
-typedef struct BdrvCoIsAllocatedData {
+typedef struct BdrvCoGetBlockStatusData {
     BlockDriverState *bs;
     BlockDriverState *base;
     int64_t sector_num;
     int nb_sectors;
     int *pnum;
-    int ret;
+    int64_t ret;
     bool done;
-} BdrvCoIsAllocatedData;
+} BdrvCoGetBlockStatusData;
 
 /*
  * Returns true iff the specified sector is present in the disk image. Drivers
@@ -2972,9 +2972,9 @@ typedef struct BdrvCoIsAllocatedData {
  * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
  * beyond the end of the disk image it will be clamped.
  */
-static int coroutine_fn bdrv_co_is_allocated(BlockDriverState *bs,
-                                             int64_t sector_num,
-                                             int nb_sectors, int *pnum)
+static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
+                                                     int64_t sector_num,
+                                                     int nb_sectors, int *pnum)
 {
     int64_t n;
 
@@ -2988,35 +2988,35 @@ static int coroutine_fn bdrv_co_is_allocated(BlockDriverState *bs,
         nb_sectors = n;
     }
 
-    if (!bs->drv->bdrv_co_is_allocated) {
+    if (!bs->drv->bdrv_co_get_block_status) {
         *pnum = nb_sectors;
         return 1;
     }
 
-    return bs->drv->bdrv_co_is_allocated(bs, sector_num, nb_sectors, pnum);
+    return bs->drv->bdrv_co_get_block_status(bs, sector_num, nb_sectors, pnum);
 }
 
-/* Coroutine wrapper for bdrv_is_allocated() */
-static void coroutine_fn bdrv_is_allocated_co_entry(void *opaque)
+/* Coroutine wrapper for bdrv_get_block_status() */
+static void coroutine_fn bdrv_get_block_status_co_entry(void *opaque)
 {
-    BdrvCoIsAllocatedData *data = opaque;
+    BdrvCoGetBlockStatusData *data = opaque;
     BlockDriverState *bs = data->bs;
 
-    data->ret = bdrv_co_is_allocated(bs, data->sector_num, data->nb_sectors,
-                                     data->pnum);
+    data->ret = bdrv_co_get_block_status(bs, data->sector_num, data->nb_sectors,
+                                         data->pnum);
     data->done = true;
 }
 
 /*
- * Synchronous wrapper around bdrv_co_is_allocated().
+ * Synchronous wrapper around bdrv_co_get_block_status().
  *
- * See bdrv_co_is_allocated() for details.
+ * See bdrv_co_get_block_status() for details.
  */
-int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
-                      int *pnum)
+int64_t bdrv_get_block_status(BlockDriverState *bs, int64_t sector_num,
+                              int nb_sectors, int *pnum)
 {
     Coroutine *co;
-    BdrvCoIsAllocatedData data = {
+    BdrvCoGetBlockStatusData data = {
         .bs = bs,
         .sector_num = sector_num,
         .nb_sectors = nb_sectors,
@@ -3026,9 +3026,9 @@ int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
 
     if (qemu_in_coroutine()) {
         /* Fast-path if already in coroutine context */
-        bdrv_is_allocated_co_entry(&data);
+        bdrv_get_block_status_co_entry(&data);
     } else {
-        co = qemu_coroutine_create(bdrv_is_allocated_co_entry);
+        co = qemu_coroutine_create(bdrv_get_block_status_co_entry);
         qemu_coroutine_enter(co, &data);
         while (!data.done) {
             qemu_aio_wait();
@@ -3037,6 +3037,12 @@ int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
     return data.ret;
 }
 
+int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num,
+                                   int nb_sectors, int *pnum)
+{
+    return bdrv_get_block_status(bs, sector_num, nb_sectors, pnum);
+}
+
 /*
  * Given an image chain: ... -> [BASE] -> [INTER1] -> [INTER2] -> [TOP]
  *
diff --git a/block/cow.c b/block/cow.c
index 2994f8d..a0fb1af 100644
--- a/block/cow.c
+++ b/block/cow.c
@@ -191,6 +191,12 @@ static int coroutine_fn cow_co_is_allocated(BlockDriverState *bs,
     return changed;
 }
 
+static int64_t coroutine_fn cow_co_get_block_status(BlockDriverState *bs,
+        int64_t sector_num, int nb_sectors, int *num_same)
+{
+    return cow_co_is_allocated(bs, sector_num, nb_sectors, num_same);
+}
+
 static int cow_update_bitmap(BlockDriverState *bs, int64_t sector_num,
         int nb_sectors)
 {
@@ -370,7 +376,7 @@ static BlockDriver bdrv_cow = {
 
     .bdrv_read              = cow_co_read,
     .bdrv_write             = cow_co_write,
-    .bdrv_co_is_allocated   = cow_co_is_allocated,
+    .bdrv_co_get_block_status   = cow_co_get_block_status,
 
     .create_options = cow_create_options,
 };
diff --git a/block/qcow.c b/block/qcow.c
index 5239bd6..acd1aeb 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -395,7 +395,7 @@ static uint64_t get_cluster_offset(BlockDriverState *bs,
     return cluster_offset;
 }
 
-static int coroutine_fn qcow_co_is_allocated(BlockDriverState *bs,
+static int64_t coroutine_fn qcow_co_get_block_status(BlockDriverState *bs,
         int64_t sector_num, int nb_sectors, int *pnum)
 {
     BDRVQcowState *s = bs->opaque;
@@ -896,7 +896,7 @@ static BlockDriver bdrv_qcow = {
 
     .bdrv_co_readv          = qcow_co_readv,
     .bdrv_co_writev         = qcow_co_writev,
-    .bdrv_co_is_allocated   = qcow_co_is_allocated,
+    .bdrv_co_get_block_status   = qcow_co_get_block_status,
 
     .bdrv_set_key           = qcow_set_key,
     .bdrv_make_empty        = qcow_make_empty,
diff --git a/block/qcow2.c b/block/qcow2.c
index e2b4202..d35a134 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -640,7 +640,7 @@ static int qcow2_reopen_prepare(BDRVReopenState *state,
     return 0;
 }
 
-static int coroutine_fn qcow2_co_is_allocated(BlockDriverState *bs,
+static int64_t coroutine_fn qcow2_co_get_block_status(BlockDriverState *bs,
         int64_t sector_num, int nb_sectors, int *pnum)
 {
     BDRVQcowState *s = bs->opaque;
@@ -1784,7 +1784,7 @@ static BlockDriver bdrv_qcow2 = {
     .bdrv_reopen_prepare  = qcow2_reopen_prepare,
     .bdrv_create        = qcow2_create,
     .bdrv_has_zero_init = bdrv_has_zero_init_1,
-    .bdrv_co_is_allocated = qcow2_co_is_allocated,
+    .bdrv_co_get_block_status = qcow2_co_get_block_status,
     .bdrv_set_key       = qcow2_set_key,
     .bdrv_make_empty    = qcow2_make_empty,
 
diff --git a/block/qed.c b/block/qed.c
index f767b05..b0978ba 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -667,7 +667,7 @@ static void qed_is_allocated_cb(void *opaque, int ret, uint64_t offset, size_t l
     }
 }
 
-static int coroutine_fn bdrv_qed_co_is_allocated(BlockDriverState *bs,
+static int64_t coroutine_fn bdrv_qed_co_get_block_status(BlockDriverState *bs,
                                                  int64_t sector_num,
                                                  int nb_sectors, int *pnum)
 {
@@ -1575,7 +1575,7 @@ static BlockDriver bdrv_qed = {
     .bdrv_reopen_prepare      = bdrv_qed_reopen_prepare,
     .bdrv_create              = bdrv_qed_create,
     .bdrv_has_zero_init       = bdrv_has_zero_init_1,
-    .bdrv_co_is_allocated     = bdrv_qed_co_is_allocated,
+    .bdrv_co_get_block_status = bdrv_qed_co_get_block_status,
     .bdrv_make_empty          = bdrv_qed_make_empty,
     .bdrv_aio_readv           = bdrv_qed_aio_readv,
     .bdrv_aio_writev          = bdrv_qed_aio_writev,
diff --git a/block/raw-posix.c b/block/raw-posix.c
index ba721d3..dbc65b0 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1084,7 +1084,7 @@ static int raw_create(const char *filename, QEMUOptionParameter *options)
  * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
  * beyond the end of the disk image it will be clamped.
  */
-static int coroutine_fn raw_co_is_allocated(BlockDriverState *bs,
+static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
                                             int64_t sector_num,
                                             int nb_sectors, int *pnum)
 {
@@ -1200,7 +1200,7 @@ static BlockDriver bdrv_file = {
     .bdrv_close = raw_close,
     .bdrv_create = raw_create,
     .bdrv_has_zero_init = bdrv_has_zero_init_1,
-    .bdrv_co_is_allocated = raw_co_is_allocated,
+    .bdrv_co_get_block_status = raw_co_get_block_status,
 
     .bdrv_aio_readv = raw_aio_readv,
     .bdrv_aio_writev = raw_aio_writev,
diff --git a/block/raw.c b/block/raw.c
index df4a868..1c7380d 100644
--- a/block/raw.c
+++ b/block/raw.c
@@ -35,11 +35,11 @@ static void raw_close(BlockDriverState *bs)
 {
 }
 
-static int coroutine_fn raw_co_is_allocated(BlockDriverState *bs,
+static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
                                             int64_t sector_num,
                                             int nb_sectors, int *pnum)
 {
-    return bdrv_is_allocated(bs->file, sector_num, nb_sectors, pnum);
+    return bdrv_get_block_status(bs->file, sector_num, nb_sectors, pnum);
 }
 
 static int64_t raw_getlength(BlockDriverState *bs)
@@ -127,7 +127,7 @@ static BlockDriver bdrv_raw = {
 
     .bdrv_co_readv          = raw_co_readv,
     .bdrv_co_writev         = raw_co_writev,
-    .bdrv_co_is_allocated   = raw_co_is_allocated,
+    .bdrv_co_get_block_status   = raw_co_get_block_status,
     .bdrv_co_discard        = raw_co_discard,
 
     .bdrv_probe         = raw_probe,
diff --git a/block/sheepdog.c b/block/sheepdog.c
index 6a41ad9..31cc573 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -2287,9 +2287,9 @@ static coroutine_fn int sd_co_discard(BlockDriverState *bs, int64_t sector_num,
     return acb->ret;
 }
 
-static coroutine_fn int
-sd_co_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
-                   int *pnum)
+static coroutine_fn int64_t
+sd_co_get_block_status(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
+                       int *pnum)
 {
     BDRVSheepdogState *s = bs->opaque;
     SheepdogInode *inode = &s->inode;
@@ -2354,7 +2354,7 @@ static BlockDriver bdrv_sheepdog = {
     .bdrv_co_writev = sd_co_writev,
     .bdrv_co_flush_to_disk  = sd_co_flush_to_disk,
     .bdrv_co_discard = sd_co_discard,
-    .bdrv_co_is_allocated = sd_co_is_allocated,
+    .bdrv_co_get_block_status = sd_co_get_block_status,
 
     .bdrv_snapshot_create   = sd_snapshot_create,
     .bdrv_snapshot_goto     = sd_snapshot_goto,
@@ -2381,7 +2381,7 @@ static BlockDriver bdrv_sheepdog_tcp = {
     .bdrv_co_writev = sd_co_writev,
     .bdrv_co_flush_to_disk  = sd_co_flush_to_disk,
     .bdrv_co_discard = sd_co_discard,
-    .bdrv_co_is_allocated = sd_co_is_allocated,
+    .bdrv_co_get_block_status = sd_co_get_block_status,
 
     .bdrv_snapshot_create   = sd_snapshot_create,
     .bdrv_snapshot_goto     = sd_snapshot_goto,
@@ -2409,7 +2409,7 @@ static BlockDriver bdrv_sheepdog_unix = {
     .bdrv_co_writev = sd_co_writev,
     .bdrv_co_flush_to_disk  = sd_co_flush_to_disk,
     .bdrv_co_discard = sd_co_discard,
-    .bdrv_co_is_allocated = sd_co_is_allocated,
+    .bdrv_co_get_block_status = sd_co_get_block_status,
 
     .bdrv_snapshot_create   = sd_snapshot_create,
     .bdrv_snapshot_goto     = sd_snapshot_goto,
diff --git a/block/vdi.c b/block/vdi.c
index 8a91525..7ab2567 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -470,7 +470,7 @@ static int vdi_reopen_prepare(BDRVReopenState *state,
     return 0;
 }
 
-static int coroutine_fn vdi_co_is_allocated(BlockDriverState *bs,
+static int64_t coroutine_fn vdi_co_get_block_status(BlockDriverState *bs,
         int64_t sector_num, int nb_sectors, int *pnum)
 {
     /* TODO: Check for too large sector_num (in bdrv_is_allocated or here). */
@@ -780,7 +780,7 @@ static BlockDriver bdrv_vdi = {
     .bdrv_reopen_prepare = vdi_reopen_prepare,
     .bdrv_create = vdi_create,
     .bdrv_has_zero_init = bdrv_has_zero_init_1,
-    .bdrv_co_is_allocated = vdi_co_is_allocated,
+    .bdrv_co_get_block_status = vdi_co_get_block_status,
     .bdrv_make_empty = vdi_make_empty,
 
     .bdrv_read = vdi_co_read,
diff --git a/block/vmdk.c b/block/vmdk.c
index 3756333..026840c 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -998,7 +998,7 @@ static VmdkExtent *find_extent(BDRVVmdkState *s,
     return NULL;
 }
 
-static int coroutine_fn vmdk_co_is_allocated(BlockDriverState *bs,
+static int64_t coroutine_fn vmdk_co_get_block_status(BlockDriverState *bs,
         int64_t sector_num, int nb_sectors, int *pnum)
 {
     BDRVVmdkState *s = bs->opaque;
@@ -1790,7 +1790,7 @@ static BlockDriver bdrv_vmdk = {
     .bdrv_close                   = vmdk_close,
     .bdrv_create                  = vmdk_create,
     .bdrv_co_flush_to_disk        = vmdk_co_flush,
-    .bdrv_co_is_allocated         = vmdk_co_is_allocated,
+    .bdrv_co_get_block_status     = vmdk_co_get_block_status,
     .bdrv_get_allocated_file_size = vmdk_get_allocated_file_size,
     .bdrv_has_zero_init           = vmdk_has_zero_init,
 
diff --git a/block/vvfat.c b/block/vvfat.c
index 87b0279..510a559 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -2874,7 +2874,7 @@ static coroutine_fn int vvfat_co_write(BlockDriverState *bs, int64_t sector_num,
     return ret;
 }
 
-static int coroutine_fn vvfat_co_is_allocated(BlockDriverState *bs,
+static int64_t coroutine_fn vvfat_co_get_block_status(BlockDriverState *bs,
 	int64_t sector_num, int nb_sectors, int* n)
 {
     BDRVVVFATState* s = bs->opaque;
@@ -2981,7 +2981,7 @@ static BlockDriver bdrv_vvfat = {
 
     .bdrv_read              = vvfat_co_read,
     .bdrv_write             = vvfat_co_write,
-    .bdrv_co_is_allocated   = vvfat_co_is_allocated,
+    .bdrv_co_get_block_status = vvfat_co_get_block_status,
 };
 
 static void bdrv_vvfat_init(void)
diff --git a/include/block/block.h b/include/block/block.h
index b567b7b..fbc0822 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -269,6 +269,8 @@ int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors);
 int bdrv_co_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors);
 int bdrv_has_zero_init_1(BlockDriverState *bs);
 int bdrv_has_zero_init(BlockDriverState *bs);
+int64_t bdrv_get_block_status(BlockDriverState *bs, int64_t sector_num,
+                              int nb_sectors, int *pnum);
 int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
                       int *pnum);
 int bdrv_is_allocated_above(BlockDriverState *top, BlockDriverState *base,
diff --git a/include/block/block_int.h b/include/block/block_int.h
index c6ac871..8f2d2c7 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -134,7 +134,7 @@ struct BlockDriver {
         int64_t sector_num, int nb_sectors);
     int coroutine_fn (*bdrv_co_discard)(BlockDriverState *bs,
         int64_t sector_num, int nb_sectors);
-    int coroutine_fn (*bdrv_co_is_allocated)(BlockDriverState *bs,
+    int64_t coroutine_fn (*bdrv_co_get_block_status)(BlockDriverState *bs,
         int64_t sector_num, int nb_sectors, int *pnum);
 
     /*
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v2 10/17] block: define get_block_status return value
  2013-07-16 16:29 [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata Paolo Bonzini
                   ` (8 preceding siblings ...)
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 09/17] block: introduce bdrv_get_block_status API Paolo Bonzini
@ 2013-07-16 16:29 ` Paolo Bonzini
  2013-07-19  6:35   ` Peter Lieven
                     ` (2 more replies)
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 11/17] block: return get_block_status data and flags for formats Paolo Bonzini
                   ` (10 subsequent siblings)
  20 siblings, 3 replies; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-16 16:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: pl, famz, stefanha

Define the return value of get_block_status.  Bits 0, 1, 2 and 9-62
are valid; bit 63 (the sign bit) is reserved for errors.  Bits 3-7
are left for future extensions.

The return code is compatible with the old is_allocated API: returning
just 0 or 1 (aka BDRV_BLOCK_DATA) will not cause any behavioral change
in clients of is_allocated.  We will return more precise information
in the next patches.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: improved comment

 block.c               |  7 +++++--
 include/block/block.h | 26 ++++++++++++++++++++++++++
 2 files changed, 31 insertions(+), 2 deletions(-)

diff --git a/block.c b/block.c
index 6e7a8a3..7ff0716 100644
--- a/block.c
+++ b/block.c
@@ -2990,7 +2990,7 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
 
     if (!bs->drv->bdrv_co_get_block_status) {
         *pnum = nb_sectors;
-        return 1;
+        return BDRV_BLOCK_DATA;
     }
 
     return bs->drv->bdrv_co_get_block_status(bs, sector_num, nb_sectors, pnum);
@@ -3040,7 +3040,10 @@ int64_t bdrv_get_block_status(BlockDriverState *bs, int64_t sector_num,
 int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num,
                                    int nb_sectors, int *pnum)
 {
-    return bdrv_get_block_status(bs, sector_num, nb_sectors, pnum);
+    int64_t ret = bdrv_get_block_status(bs, sector_num, nb_sectors, pnum);
+    return
+        (ret & BDRV_BLOCK_DATA) ||
+        ((ret & BDRV_BLOCK_ZERO) && !bdrv_has_zero_init(bs));
 }
 
 /*
diff --git a/include/block/block.h b/include/block/block.h
index fbc0822..83c7112 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -81,6 +81,32 @@ typedef struct BlockDevOps {
 #define BDRV_SECTOR_SIZE   (1ULL << BDRV_SECTOR_BITS)
 #define BDRV_SECTOR_MASK   ~(BDRV_SECTOR_SIZE - 1)
 
+/* BDRV_BLOCK_DATA: data is read from bs->file or another file
+ * BDRV_BLOCK_ZERO: sectors read as zero
+ * BDRV_BLOCK_OFFSET_VALID: sector stored in bs->file as raw data
+ *
+ * If BDRV_BLOCK_OFFSET_VALID is set, bits 9-62 represent the offset in
+ * bs->file where sector data can be read from as raw data.
+ *
+ * DATA == 0 && ZERO == 0 means that data is read from backing_hd if present.
+ *
+ * DATA ZERO OFFSET_VALID
+ *  t    t        t       sectors read as zero, bs->file is zero at offset
+ *  t    f        t       sectors read as valid from bs->file at offset
+ *  f    t        t       sectors preallocated, read as zero, bs->file not
+ *                        necessarily zero at offset
+ *  f    f        t       sectors preallocated but read from backing_hd,
+ *                        bs->file contains garbage at offset
+ *  t    t        f       sectors preallocated, read as zero, unknown offset
+ *  t    f        f       sectors read from unknown file or offset
+ *  f    t        f       not allocated or unknown offset, read as zero
+ *  f    f        f       not allocated or unknown offset, read from backing_hd
+ */
+#define BDRV_BLOCK_DATA         1
+#define BDRV_BLOCK_ZERO         2
+#define BDRV_BLOCK_OFFSET_VALID 4
+#define BDRV_BLOCK_OFFSET_MASK  BDRV_SECTOR_MASK
+
 typedef enum {
     BDRV_ACTION_REPORT, BDRV_ACTION_IGNORE, BDRV_ACTION_STOP
 } BlockErrorAction;
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v2 11/17] block: return get_block_status data and flags for formats
  2013-07-16 16:29 [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata Paolo Bonzini
                   ` (9 preceding siblings ...)
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 10/17] block: define get_block_status return value Paolo Bonzini
@ 2013-07-16 16:29 ` Paolo Bonzini
  2013-07-19 19:25   ` Eric Blake
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 12/17] qemu-img: add a "map" subcommand Paolo Bonzini
                   ` (9 subsequent siblings)
  20 siblings, 1 reply; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-16 16:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: pl, famz, stefanha

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: handle extents that are stored in bs->file

 block/cow.c      |  8 +++++++-
 block/qcow.c     |  9 ++++++++-
 block/qcow2.c    | 16 ++++++++++++++--
 block/qed.c      | 35 ++++++++++++++++++++++++++++-------
 block/sheepdog.c |  2 +-
 block/vdi.c      | 13 ++++++++++++-
 block/vmdk.c     | 19 ++++++++++++++++++-
 block/vvfat.c    | 11 ++++++-----
 8 files changed, 94 insertions(+), 19 deletions(-)

diff --git a/block/cow.c b/block/cow.c
index a0fb1af..194ceeb 100644
--- a/block/cow.c
+++ b/block/cow.c
@@ -194,7 +194,13 @@ static int coroutine_fn cow_co_is_allocated(BlockDriverState *bs,
 static int64_t coroutine_fn cow_co_get_block_status(BlockDriverState *bs,
         int64_t sector_num, int nb_sectors, int *num_same)
 {
-    return cow_co_is_allocated(bs, sector_num, nb_sectors, num_same);
+    BDRVCowState *s = bs->opaque;
+    int ret = cow_co_is_allocated(bs, sector_num, nb_sectors, num_same);
+    int64_t offset = s->cow_sectors_offset + (sector_num << BDRV_SECTOR_BITS);
+    if (ret < 0) {
+        return ret;
+    }
+    return (ret ? BDRV_BLOCK_DATA : 0) | offset | BDRV_BLOCK_OFFSET_VALID;
 }
 
 static int cow_update_bitmap(BlockDriverState *bs, int64_t sector_num,
diff --git a/block/qcow.c b/block/qcow.c
index acd1aeb..9aebcc3 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -410,7 +410,14 @@ static int64_t coroutine_fn qcow_co_get_block_status(BlockDriverState *bs,
     if (n > nb_sectors)
         n = nb_sectors;
     *pnum = n;
-    return (cluster_offset != 0);
+    if (!cluster_offset) {
+	return 0;
+    }
+    if ((cluster_offset & QCOW_OFLAG_COMPRESSED) || s->crypt_method) {
+        return BDRV_BLOCK_DATA;
+    }
+    cluster_offset |= (index_in_cluster << BDRV_SECTOR_BITS);
+    return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | cluster_offset;
 }
 
 static int decompress_buffer(uint8_t *out_buf, int out_buf_size,
diff --git a/block/qcow2.c b/block/qcow2.c
index d35a134..cac7ee8 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -645,7 +645,8 @@ static int64_t coroutine_fn qcow2_co_get_block_status(BlockDriverState *bs,
 {
     BDRVQcowState *s = bs->opaque;
     uint64_t cluster_offset;
-    int ret;
+    int index_in_cluster, ret;
+    int64_t status = 0;
 
     *pnum = nb_sectors;
     qemu_co_mutex_lock(&s->lock);
@@ -655,7 +656,18 @@ static int64_t coroutine_fn qcow2_co_get_block_status(BlockDriverState *bs,
         return ret;
     }
 
-    return (cluster_offset != 0) || (ret == QCOW2_CLUSTER_ZERO);
+    if (cluster_offset != 0 && ret != QCOW2_CLUSTER_COMPRESSED &&
+        !s->crypt_method) {
+        index_in_cluster = sector_num & (s->cluster_sectors - 1);
+        cluster_offset |= (index_in_cluster << BDRV_SECTOR_BITS);
+        status |= BDRV_BLOCK_OFFSET_VALID | cluster_offset;
+    }
+    if (ret == QCOW2_CLUSTER_ZERO) {
+        status |= BDRV_BLOCK_ZERO;
+    } else if (ret != QCOW2_CLUSTER_UNALLOCATED) {
+        status |= BDRV_BLOCK_DATA;
+    }
+    return status;
 }
 
 /* handle reading after the end of the backing file */
diff --git a/block/qed.c b/block/qed.c
index b0978ba..cf4fbed 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -652,16 +652,36 @@ static int bdrv_qed_create(const char *filename, QEMUOptionParameter *options)
 }
 
 typedef struct {
+    BlockDriverState *bs;
     Coroutine *co;
-    int is_allocated;
+    uint64_t pos;
+    int64_t status;
     int *pnum;
 } QEDIsAllocatedCB;
 
 static void qed_is_allocated_cb(void *opaque, int ret, uint64_t offset, size_t len)
 {
     QEDIsAllocatedCB *cb = opaque;
+    BDRVQEDState *s = cb->bs->opaque;
     *cb->pnum = len / BDRV_SECTOR_SIZE;
-    cb->is_allocated = (ret == QED_CLUSTER_FOUND || ret == QED_CLUSTER_ZERO);
+    switch (ret) {
+    case QED_CLUSTER_FOUND:
+       offset |= qed_offset_into_cluster(s, cb->pos);
+	cb->status = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | offset;
+	break;
+    case QED_CLUSTER_ZERO:
+	cb->status = BDRV_BLOCK_ZERO;
+	break;
+    case QED_CLUSTER_L2:
+    case QED_CLUSTER_L1:
+	cb->status = 0;
+	break;
+    default:
+	assert(ret < 0);
+	cb->status = ret;
+        break;
+    }
+
     if (cb->co) {
         qemu_coroutine_enter(cb->co, NULL);
     }
@@ -672,25 +692,26 @@ static int64_t coroutine_fn bdrv_qed_co_get_block_status(BlockDriverState *bs,
                                                  int nb_sectors, int *pnum)
 {
     BDRVQEDState *s = bs->opaque;
-    uint64_t pos = (uint64_t)sector_num * BDRV_SECTOR_SIZE;
     size_t len = (size_t)nb_sectors * BDRV_SECTOR_SIZE;
     QEDIsAllocatedCB cb = {
-        .is_allocated = -1,
+        .bs = bs,
+        .pos = (uint64_t)sector_num * BDRV_SECTOR_SIZE,
+        .status = BDRV_BLOCK_OFFSET_MASK,
         .pnum = pnum,
     };
     QEDRequest request = { .l2_table = NULL };
 
-    qed_find_cluster(s, &request, pos, len, qed_is_allocated_cb, &cb);
+    qed_find_cluster(s, &request, cb.pos, len, qed_is_allocated_cb, &cb);
 
     /* Now sleep if the callback wasn't invoked immediately */
-    while (cb.is_allocated == -1) {
+    while (cb.status == BDRV_BLOCK_OFFSET_MASK) {
         cb.co = qemu_coroutine_self();
         qemu_coroutine_yield();
     }
 
     qed_unref_l2_cache_entry(request.l2_table);
 
-    return cb.is_allocated;
+    return cb.status;
 }
 
 static int bdrv_qed_make_empty(BlockDriverState *bs)
diff --git a/block/sheepdog.c b/block/sheepdog.c
index 31cc573..b0ff8f1 100644
--- a/block/sheepdog.c
+++ b/block/sheepdog.c
@@ -2297,7 +2297,7 @@ sd_co_get_block_status(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
                   end = DIV_ROUND_UP((sector_num + nb_sectors) *
                                      BDRV_SECTOR_SIZE, SD_DATA_OBJ_SIZE);
     unsigned long idx;
-    int ret = 1;
+    int64_t ret = BDRV_BLOCK_DATA;
 
     for (idx = start; idx < end; idx++) {
         if (inode->data_vdi_id[idx] == 0) {
diff --git a/block/vdi.c b/block/vdi.c
index 7ab2567..e96b5b3 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -479,12 +479,23 @@ static int64_t coroutine_fn vdi_co_get_block_status(BlockDriverState *bs,
     size_t sector_in_block = sector_num % s->block_sectors;
     int n_sectors = s->block_sectors - sector_in_block;
     uint32_t bmap_entry = le32_to_cpu(s->bmap[bmap_index]);
+    uint64_t offset;
+    int result;
+
     logout("%p, %" PRId64 ", %d, %p\n", bs, sector_num, nb_sectors, pnum);
     if (n_sectors > nb_sectors) {
         n_sectors = nb_sectors;
     }
     *pnum = n_sectors;
-    return VDI_IS_ALLOCATED(bmap_entry);
+    result = VDI_IS_ALLOCATED(bmap_entry);
+    if (!result) {
+	return 0;
+    }
+
+    offset = s->header.offset_data +
+                              (uint64_t)bmap_entry * s->block_size +
+                              sector_in_block * SECTOR_SIZE;
+    return BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | offset;
 }
 
 static int vdi_co_read(BlockDriverState *bs,
diff --git a/block/vmdk.c b/block/vmdk.c
index 026840c..cd8db57 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1015,7 +1015,24 @@ static int64_t coroutine_fn vmdk_co_get_block_status(BlockDriverState *bs,
                             sector_num * 512, 0, &offset);
     qemu_co_mutex_unlock(&s->lock);
 
-    ret = (ret == VMDK_OK || ret == VMDK_ZEROED);
+    switch (ret) {
+    case VMDK_ERROR:
+        ret = -EIO;
+        break;
+    case VMDK_UNALLOC:
+        ret = 0;
+        break;
+    case VMDK_ZEROED:
+        ret = BDRV_BLOCK_ZERO;
+        break;
+    case VMDK_OK:
+        ret = BDRV_BLOCK_DATA;
+        if (extent->file == bs->file) {
+            ret |= BDRV_BLOCK_OFFSET_VALID | offset;
+        }
+
+        break;
+    }
 
     index_in_cluster = sector_num % extent->cluster_sectors;
     n = extent->cluster_sectors - index_in_cluster;
diff --git a/block/vvfat.c b/block/vvfat.c
index 510a559..415fba3 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -2879,11 +2879,12 @@ static int64_t coroutine_fn vvfat_co_get_block_status(BlockDriverState *bs,
 {
     BDRVVVFATState* s = bs->opaque;
     *n = s->sector_count - sector_num;
-    if (*n > nb_sectors)
-	*n = nb_sectors;
-    else if (*n < 0)
-	return 0;
-    return 1;
+    if (*n > nb_sectors) {
+        *n = nb_sectors;
+    } else if (*n < 0) {
+        return 0;
+    }
+    return BDRV_BLOCK_DATA;
 }
 
 static int write_target_commit(BlockDriverState *bs, int64_t sector_num,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v2 12/17] qemu-img: add a "map" subcommand
  2013-07-16 16:29 [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata Paolo Bonzini
                   ` (10 preceding siblings ...)
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 11/17] block: return get_block_status data and flags for formats Paolo Bonzini
@ 2013-07-16 16:29 ` Paolo Bonzini
  2013-07-18  7:25   ` Fam Zheng
  2013-07-19 19:36   ` Eric Blake
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 13/17] block: use bdrv_has_zero_init to return BDRV_BLOCK_ZERO Paolo Bonzini
                   ` (8 subsequent siblings)
  20 siblings, 2 replies; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-16 16:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: pl, famz, stefanha

This command dumps the metadata of an entire chain, in either tabular or JSON
format.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
        v1->v2: mention encrypted clusters, use PRId64

 qemu-img-cmds.hx |   6 ++
 qemu-img.c       | 185 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 191 insertions(+)

diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 4ca7e95..acc3a1c 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -45,6 +45,12 @@ STEXI
 @item info [-f @var{fmt}] [--output=@var{ofmt}] [--backing-chain] @var{filename}
 ETEXI
 
+DEF("map", img_map,
+    "map [-f fmt] [--output=ofmt] filename")
+STEXI
+@item map [-f @var{fmt}] [--output=@var{ofmt}] filename
+ETEXI
+
 DEF("snapshot", img_snapshot,
     "snapshot [-q] [-l | -a snapshot | -c snapshot | -d snapshot] filename")
 STEXI
diff --git a/qemu-img.c b/qemu-img.c
index c5c8ebc..b905bb6 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1768,6 +1768,191 @@ static int img_info(int argc, char **argv)
     return 0;
 }
 
+
+typedef struct MapEntry {
+    int flags;
+    int depth;
+    int64_t start;
+    int64_t length;
+    int64_t offset;
+} MapEntry;
+
+static void dump_map_entry(OutputFormat output_format, MapEntry *e,
+                           MapEntry *next)
+{
+    switch (output_format) {
+    case OFORMAT_HUMAN:
+        if ((e->flags & (BDRV_BLOCK_DATA|BDRV_BLOCK_ZERO)) == BDRV_BLOCK_DATA) {
+            printf("%"PRId64" %"PRId64" %d %"PRId64"\n",
+                   e->start, e->length, e->depth, e->offset);
+        }
+        /* This format ignores the distinction between 0, ZERO and ZERO|DATA.
+         * Modify the flags here to allow more coalescing.
+         */
+        if (next &&
+            (next->flags & (BDRV_BLOCK_DATA|BDRV_BLOCK_ZERO)) != BDRV_BLOCK_DATA) {
+            next->flags &= ~BDRV_BLOCK_DATA;
+            next->flags |= BDRV_BLOCK_ZERO;
+        }
+        break;
+    case OFORMAT_JSON:
+        printf("%s{ 'start': %"PRId64", 'length': %"PRId64", 'depth': %d, "
+               "'zero': %s, 'data': %s",
+               (e->start == 0 ? "[" : ",\n"),
+               e->start, e->length, e->depth,
+               (e->flags & BDRV_BLOCK_ZERO) ? "true" : "false",
+               (e->flags & BDRV_BLOCK_DATA) ? "true" : "false");
+        if (e->flags & BDRV_BLOCK_OFFSET_VALID) {
+            printf(", 'offset': %"PRId64"", e->offset);
+        }
+        putchar('}');
+
+        if (!next) {
+            printf("]\n");
+        }
+        break;
+    }
+}
+
+static int64_t get_block_status(BlockDriverState *bs, int64_t sector_num,
+                                int *nb_sectors, int *depth)
+{
+    int64_t ret;
+
+    /* As an optimization, we could cache the current range of unallocated
+     * clusters in each file of the chain, and avoid querying the same
+     * range repeatedly.
+     */
+
+    *depth = 0;
+    for (;;) {
+        int orig_nb_sectors = *nb_sectors;
+
+        if (bs == NULL) {
+            return 0;
+        }
+        ret = bdrv_get_block_status(bs, sector_num, *nb_sectors, nb_sectors);
+        if (ret < 0) {
+            return ret;
+        }
+
+        if (ret & (BDRV_BLOCK_ZERO|BDRV_BLOCK_DATA)) {
+            return ret;
+        }
+        if (!*nb_sectors) {
+            /* Beyond the end of this image (while presumably within the
+             * range of the BlockDriverState above this one).  The extra
+             * data is read as zeroes.
+             */
+            *nb_sectors = orig_nb_sectors;
+            return BDRV_BLOCK_ZERO;
+        }
+
+        bs = bs->backing_hd;
+        (*depth)++;
+    }
+}
+
+static int img_map(int argc, char **argv)
+{
+    int c;
+    OutputFormat output_format = OFORMAT_HUMAN;
+    BlockDriverState *bs;
+    const char *filename, *fmt, *output;
+    int64_t length;
+    MapEntry curr = { .length = 0 }, next;
+
+    fmt = NULL;
+    output = NULL;
+    for(;;) {
+        int option_index = 0;
+        static const struct option long_options[] = {
+            {"help", no_argument, 0, 'h'},
+            {"format", required_argument, 0, 'f'},
+            {"output", required_argument, 0, OPTION_OUTPUT},
+            {0, 0, 0, 0}
+        };
+        c = getopt_long(argc, argv, "f:h",
+                        long_options, &option_index);
+        if (c == -1) {
+            break;
+        }
+        switch(c) {
+        case '?':
+        case 'h':
+            help();
+            break;
+        case 'f':
+            fmt = optarg;
+            break;
+        case OPTION_OUTPUT:
+            output = optarg;
+            break;
+        }
+    }
+    if (optind >= argc) {
+        help();
+    }
+    filename = argv[optind++];
+
+    if (output && !strcmp(output, "json")) {
+        output_format = OFORMAT_JSON;
+    } else if (output && !strcmp(output, "human")) {
+        output_format = OFORMAT_HUMAN;
+    } else if (output) {
+        error_report("--output must be used with human or json as argument.");
+        return 1;
+    }
+
+    bs = bdrv_new_open(filename, fmt, BDRV_O_FLAGS, true, false);
+    if (!bs) {
+        return 1;
+    }
+
+    length = bdrv_getlength(bs);
+    while (curr.start + curr.length < length) {
+        int64_t nsectors_left, ret;
+        int64_t sector_num;
+        int n, depth, flags;
+
+        /* Probe up to 1 G at a time.  */
+        sector_num = (curr.start + curr.length) >> BDRV_SECTOR_BITS;
+        nsectors_left = DIV_ROUND_UP(length, BDRV_SECTOR_SIZE) - sector_num;
+        n = MAX(1 << (30 - BDRV_SECTOR_BITS), nsectors_left);
+        ret = get_block_status(bs, sector_num, &n, &depth);
+
+        if (ret < 0) {
+            error_report("Could not read file metadata: %s", strerror(-ret));
+            return 1;
+        }
+        if ((ret & BDRV_BLOCK_DATA) &&
+            !(ret & BDRV_BLOCK_OFFSET_VALID)) {
+            error_report("File contains external, encrypted or compressed clusters.");
+            return 1;
+        }
+
+        flags = ret & ~BDRV_BLOCK_OFFSET_MASK;
+        ret &= BDRV_BLOCK_OFFSET_MASK;
+        if (curr.length == 0 || curr.flags != flags || curr.depth != depth ||
+            ((curr.flags & BDRV_BLOCK_OFFSET_VALID) &&
+             curr.offset + curr.length != ret)) {
+            next.flags = flags;
+            next.depth = depth;
+            next.start = sector_num << BDRV_SECTOR_BITS;
+            next.offset = ret;
+            next.length = 0;
+            if (curr.length > 0) {
+                dump_map_entry(output_format, &curr, &next);
+            }
+            curr = next;
+        }
+        curr.length += n << BDRV_SECTOR_BITS;
+    }
+
+    dump_map_entry(output_format, &curr, NULL);
+    return 0;
+}
+
 #define SNAPSHOT_LIST   1
 #define SNAPSHOT_CREATE 2
 #define SNAPSHOT_APPLY  3
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v2 13/17] block: use bdrv_has_zero_init to return BDRV_BLOCK_ZERO
  2013-07-16 16:29 [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata Paolo Bonzini
                   ` (11 preceding siblings ...)
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 12/17] qemu-img: add a "map" subcommand Paolo Bonzini
@ 2013-07-16 16:29 ` Paolo Bonzini
  2013-07-17 17:50   ` Peter Lieven
  2013-07-19 19:37   ` Eric Blake
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 14/17] raw-posix: return get_block_status data and flags Paolo Bonzini
                   ` (7 subsequent siblings)
  20 siblings, 2 replies; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-16 16:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: pl, famz, stefanha

Alternatively, this could use a "discard zeroes data" flag returned
by bdrv_get_info.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 7ff0716..0c867b8 100644
--- a/block.c
+++ b/block.c
@@ -2977,6 +2977,7 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
                                                      int nb_sectors, int *pnum)
 {
     int64_t n;
+    int64_t ret;
 
     if (sector_num >= bs->total_sectors) {
         *pnum = 0;
@@ -2993,7 +2994,12 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
         return BDRV_BLOCK_DATA;
     }
 
-    return bs->drv->bdrv_co_get_block_status(bs, sector_num, nb_sectors, pnum);
+    ret = bs->drv->bdrv_co_get_block_status(bs, sector_num, nb_sectors, pnum);
+    if (!(ret & BDRV_BLOCK_DATA) && bdrv_has_zero_init(bs)) {
+        ret |= BDRV_BLOCK_ZERO;
+    }
+
+    return ret;
 }
 
 /* Coroutine wrapper for bdrv_get_block_status() */
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v2 14/17] raw-posix: return get_block_status data and flags
  2013-07-16 16:29 [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata Paolo Bonzini
                   ` (12 preceding siblings ...)
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 13/17] block: use bdrv_has_zero_init to return BDRV_BLOCK_ZERO Paolo Bonzini
@ 2013-07-16 16:29 ` Paolo Bonzini
  2013-07-19 19:48   ` Eric Blake
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 15/17] raw-posix: detect XFS unwritten extents Paolo Bonzini
                   ` (6 subsequent siblings)
  20 siblings, 1 reply; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-16 16:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: pl, famz, stefanha

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/raw-posix.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index dbc65b0..d011cfd 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1089,7 +1089,7 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
                                             int nb_sectors, int *pnum)
 {
     off_t start, data, hole;
-    int ret;
+    int64_t ret;
 
     ret = fd_open(bs);
     if (ret < 0) {
@@ -1097,6 +1097,7 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
     }
 
     start = sector_num * BDRV_SECTOR_SIZE;
+    ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
 
 #ifdef CONFIG_FIEMAP
 
@@ -1114,7 +1115,7 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
     if (ioctl(s->fd, FS_IOC_FIEMAP, &f) == -1) {
         /* Assume everything is allocated.  */
         *pnum = nb_sectors;
-        return 1;
+        return ret;
     }
 
     if (f.fm.fm_mapped_extents == 0) {
@@ -1141,7 +1142,7 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
 
         /* Most likely EINVAL.  Assume everything is allocated.  */
         *pnum = nb_sectors;
-        return 1;
+        return ret;
     }
 
     if (hole > start) {
@@ -1154,19 +1155,21 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
         }
     }
 #else
-    *pnum = nb_sectors;
-    return 1;
+    data = 0;
+    hole = start + nb_sectors * BDRV_SECTOR_SIZE;
 #endif
 
     if (data <= start) {
         /* On a data extent, compute sectors to the end of the extent.  */
         *pnum = MIN(nb_sectors, (hole - start) / BDRV_SECTOR_SIZE);
-        return 1;
     } else {
         /* On a hole, compute sectors to the beginning of the next extent.  */
         *pnum = MIN(nb_sectors, (data - start) / BDRV_SECTOR_SIZE);
-        return 0;
+        ret &= ~BDRV_BLOCK_DATA;
+        ret |= BDRV_BLOCK_ZERO;
     }
+
+    return ret;
 }
 
 static coroutine_fn BlockDriverAIOCB *raw_aio_discard(BlockDriverState *bs,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v2 15/17] raw-posix: detect XFS unwritten extents
  2013-07-16 16:29 [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata Paolo Bonzini
                   ` (13 preceding siblings ...)
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 14/17] raw-posix: return get_block_status data and flags Paolo Bonzini
@ 2013-07-16 16:29 ` Paolo Bonzini
  2013-07-19 20:10   ` Eric Blake
  2013-08-02 15:05   ` Christoph Hellwig
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 16/17] block: add default get_block_status implementation for protocols Paolo Bonzini
                   ` (5 subsequent siblings)
  20 siblings, 2 replies; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-16 16:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: pl, famz, stefanha

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block/raw-posix.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/block/raw-posix.c b/block/raw-posix.c
index d011cfd..1b41ea3 100644
--- a/block/raw-posix.c
+++ b/block/raw-posix.c
@@ -1128,6 +1128,9 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
     } else {
         data = f.fe.fe_logical;
         hole = f.fe.fe_logical + f.fe.fe_length;
+        if (f.fe.fe_flags & FIEMAP_EXTENT_UNWRITTEN) {
+            ret |= BDRV_BLOCK_ZERO;
+        }
     }
 
 #elif defined SEEK_HOLE && defined SEEK_DATA
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v2 16/17] block: add default get_block_status implementation for protocols
  2013-07-16 16:29 [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata Paolo Bonzini
                   ` (14 preceding siblings ...)
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 15/17] raw-posix: detect XFS unwritten extents Paolo Bonzini
@ 2013-07-16 16:29 ` Paolo Bonzini
  2013-07-19 20:11   ` Eric Blake
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 17/17] block: look for zero blocks in bs->file Paolo Bonzini
                   ` (4 subsequent siblings)
  20 siblings, 1 reply; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-16 16:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: pl, famz, stefanha

Protocols return raw data, so you can assume the offsets to pass
through unchanged.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 0c867b8..557ce29 100644
--- a/block.c
+++ b/block.c
@@ -2991,7 +2991,11 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
 
     if (!bs->drv->bdrv_co_get_block_status) {
         *pnum = nb_sectors;
-        return BDRV_BLOCK_DATA;
+        ret = BDRV_BLOCK_DATA;
+        if (bs->drv->protocol_name) {
+            ret |= BDRV_BLOCK_OFFSET_VALID | (sector_num * BDRV_SECTOR_SIZE);
+        }
+        return ret;
     }
 
     ret = bs->drv->bdrv_co_get_block_status(bs, sector_num, nb_sectors, pnum);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* [Qemu-devel] [PATCH v2 17/17] block: look for zero blocks in bs->file
  2013-07-16 16:29 [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata Paolo Bonzini
                   ` (15 preceding siblings ...)
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 16/17] block: add default get_block_status implementation for protocols Paolo Bonzini
@ 2013-07-16 16:29 ` Paolo Bonzini
  2013-07-19  7:33   ` Stefan Hajnoczi
  2013-07-19 20:12   ` Eric Blake
  2013-07-16 16:56 ` [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata Eric Blake
                   ` (3 subsequent siblings)
  20 siblings, 2 replies; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-16 16:29 UTC (permalink / raw)
  To: qemu-devel; +Cc: pl, famz, stefanha

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 block.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 557ce29..2d7d71f 100644
--- a/block.c
+++ b/block.c
@@ -2977,7 +2977,7 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
                                                      int nb_sectors, int *pnum)
 {
     int64_t n;
-    int64_t ret;
+    int64_t ret, ret2;
 
     if (sector_num >= bs->total_sectors) {
         *pnum = 0;
@@ -3003,6 +3003,14 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
         ret |= BDRV_BLOCK_ZERO;
     }
 
+    if (bs->file &&
+        (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
+        (ret & BDRV_BLOCK_OFFSET_VALID)) {
+        ret2 = bdrv_co_get_block_status(bs->file, ret >> BDRV_SECTOR_BITS,
+                                        *pnum, pnum);
+        ret |= (ret2 & BDRV_BLOCK_ZERO);
+    }
+
     return ret;
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata
  2013-07-16 16:29 [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata Paolo Bonzini
                   ` (16 preceding siblings ...)
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 17/17] block: look for zero blocks in bs->file Paolo Bonzini
@ 2013-07-16 16:56 ` Eric Blake
  2013-07-16 16:57   ` Paolo Bonzini
  2013-07-18  8:17 ` Peter Lieven
                   ` (2 subsequent siblings)
  20 siblings, 1 reply; 64+ messages in thread
From: Eric Blake @ 2013-07-16 16:56 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, pl, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 1213 bytes --]

On 07/16/2013 10:29 AM, Paolo Bonzini wrote:
> This series adds a subcommand to "map" that can dump file metadata.
> Metadata that is dumped includes:
> 
> - whether blocks are allocated in bs->file and, if so, where
> 
> - whether blocks are zero
> 
> - whether data is read from bs or bs->backing_hd

Is this command only usable from qemu-img, or is there a QMP counterpart
for getting at the same information on a live image while qemu is running?

> 
> Metadata is dumped for an entire chain of images.  One example usage is
> (for non-compressed, non-encrypted images) to transform the metadata
> into a Linux device-mapper setup, and make a qcow2 image available (for
> read only) as a block device.  Another possible usage is to determine
> the used areas of a file, and convert it in place to another format.
> Alternatively, the richer data can be used by block jobs to quickly
> determine if a block is zero without reading it.

Definitely useful information!  I haven't looked at the implementation
yet, but there had better be a JSON representation of the output :)

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata
  2013-07-16 16:56 ` [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata Eric Blake
@ 2013-07-16 16:57   ` Paolo Bonzini
  2013-07-19 10:48     ` Wenchao Xia
  0 siblings, 1 reply; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-16 16:57 UTC (permalink / raw)
  To: Eric Blake; +Cc: famz, pl, qemu-devel, stefanha

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Il 16/07/2013 18:56, Eric Blake ha scritto:
> On 07/16/2013 10:29 AM, Paolo Bonzini wrote:
>> This series adds a subcommand to "map" that can dump file
>> metadata. Metadata that is dumped includes:
>> 
>> - whether blocks are allocated in bs->file and, if so, where
>> 
>> - whether blocks are zero
>> 
>> - whether data is read from bs or bs->backing_hd
> 
> Is this command only usable from qemu-img, or is there a QMP
> counterpart for getting at the same information on a live image
> while qemu is running?

Not yet, we had no use case yet.  It may be exposed later to the
guests via the SCSI command GET LBA STATUS.

Paolo

>> 
>> Metadata is dumped for an entire chain of images.  One example
>> usage is (for non-compressed, non-encrypted images) to transform
>> the metadata into a Linux device-mapper setup, and make a qcow2
>> image available (for read only) as a block device.  Another
>> possible usage is to determine the used areas of a file, and
>> convert it in place to another format. Alternatively, the richer
>> data can be used by block jobs to quickly determine if a block is
>> zero without reading it.
> 
> Definitely useful information!  I haven't looked at the
> implementation yet, but there had better be a JSON representation
> of the output :)
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJR5XtfAAoJEBvWZb6bTYbypRUP/jadsxWgkWGbefpIqUL7Iykt
EiPeQfuwJNrq2uXU5orD11xkJbm5gElg/iLQBfBnpAZTJ9VNKTkGUwwhAPgLfIrT
LkTwd7Puqbe8q965K7kXqDoxT5fN+5nyyE9TbvlOEkThLRebWGVEgfViKoqnPYFW
ApejY05zmWrT9+wk984/ELNH0XAv54vITGGzrkgFKpt+h+MZ/1ZQfy3xapczBPTh
ui+U9Y2zJ3vdE3JxYGxNM0IVAWvrn0prCRpgGF+Id1uaT839UuYU1ukEFxJpynrh
iXHuryQgI2zNx4IDwSwIXfOLc1P0XR59eOaN+n+mw45faIOEaEOlChSC4SvDCtTT
3B1bK7ZqTFy8P8KQ+SjA4NIkl8yb7qLmEGmweJnSxSI1JUz/P3k+uC/FZd/Zxe/r
nFVdXYQtKVCZW0CMBMdHRE2T/3PFlx2nz/dzW+OrKyweKpDqcYjU+TyyrvXiT5Yv
E8ttpOzQsNJl+VdEWpfa4StgRgQxmazVM80OyE52cNKAixUyLpJ7pR4fsF1mcswc
BWdpSe/J8T6LRyTOAKoWoi2w8Mw/DMey4neQ1+py8tWuCqJix3NtyduVtsvrRiYd
aPcXvGfAvDbzEcMHnwz0SVpTqleK/7H7pYScBdWYnmUG5wJuxfhJHD/5JDQ1wmj0
jGHmCAddWW+8RrHZGrpX
=paUV
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 13/17] block: use bdrv_has_zero_init to return BDRV_BLOCK_ZERO
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 13/17] block: use bdrv_has_zero_init to return BDRV_BLOCK_ZERO Paolo Bonzini
@ 2013-07-17 17:50   ` Peter Lieven
  2013-07-19 19:37   ` Eric Blake
  1 sibling, 0 replies; 64+ messages in thread
From: Peter Lieven @ 2013-07-17 17:50 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, qemu-devel, stefanha

We should change this when bdi->discard_zeroes is available.

Peter

Am 16.07.2013 um 18:29 schrieb Paolo Bonzini <pbonzini@redhat.com>:

> Alternatively, this could use a "discard zeroes data" flag returned
> by bdrv_get_info.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> block.c | 8 +++++++-
> 1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/block.c b/block.c
> index 7ff0716..0c867b8 100644
> --- a/block.c
> +++ b/block.c
> @@ -2977,6 +2977,7 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
>                                                      int nb_sectors, int *pnum)
> {
>     int64_t n;
> +    int64_t ret;
> 
>     if (sector_num >= bs->total_sectors) {
>         *pnum = 0;
> @@ -2993,7 +2994,12 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
>         return BDRV_BLOCK_DATA;
>     }
> 
> -    return bs->drv->bdrv_co_get_block_status(bs, sector_num, nb_sectors, pnum);
> +    ret = bs->drv->bdrv_co_get_block_status(bs, sector_num, nb_sectors, pnum);
> +    if (!(ret & BDRV_BLOCK_DATA) && bdrv_has_zero_init(bs)) {
> +        ret |= BDRV_BLOCK_ZERO;
> +    }
> +
> +    return ret;
> }
> 
> /* Coroutine wrapper for bdrv_get_block_status() */
> -- 
> 1.8.3.1
> 
> 

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 12/17] qemu-img: add a "map" subcommand
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 12/17] qemu-img: add a "map" subcommand Paolo Bonzini
@ 2013-07-18  7:25   ` Fam Zheng
  2013-07-18  8:55     ` Paolo Bonzini
  2013-07-19 19:36   ` Eric Blake
  1 sibling, 1 reply; 64+ messages in thread
From: Fam Zheng @ 2013-07-18  7:25 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: pl, qemu-devel, stefanha

On Tue, 07/16 18:29, Paolo Bonzini wrote:
> This command dumps the metadata of an entire chain, in either tabular or JSON
> format.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>         v1->v2: mention encrypted clusters, use PRId64
> 
>  qemu-img-cmds.hx |   6 ++
>  qemu-img.c       | 185 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 191 insertions(+)
> 
> diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
> index 4ca7e95..acc3a1c 100644
> --- a/qemu-img-cmds.hx
> +++ b/qemu-img-cmds.hx
> @@ -45,6 +45,12 @@ STEXI
>  @item info [-f @var{fmt}] [--output=@var{ofmt}] [--backing-chain] @var{filename}
>  ETEXI
>  
> +DEF("map", img_map,
> +    "map [-f fmt] [--output=ofmt] filename")
> +STEXI
> +@item map [-f @var{fmt}] [--output=@var{ofmt}] filename
> +ETEXI
> +
>  DEF("snapshot", img_snapshot,
>      "snapshot [-q] [-l | -a snapshot | -c snapshot | -d snapshot] filename")
>  STEXI
> diff --git a/qemu-img.c b/qemu-img.c
> index c5c8ebc..b905bb6 100644
> --- a/qemu-img.c
> +++ b/qemu-img.c
> @@ -1768,6 +1768,191 @@ static int img_info(int argc, char **argv)
>      return 0;
>  }
>  
> +
> +typedef struct MapEntry {
> +    int flags;
> +    int depth;
> +    int64_t start;
> +    int64_t length;
> +    int64_t offset;
> +} MapEntry;
> +
> +static void dump_map_entry(OutputFormat output_format, MapEntry *e,
> +                           MapEntry *next)
> +{
> +    switch (output_format) {
> +    case OFORMAT_HUMAN:
> +        if ((e->flags & (BDRV_BLOCK_DATA|BDRV_BLOCK_ZERO)) == BDRV_BLOCK_DATA) {
> +            printf("%"PRId64" %"PRId64" %d %"PRId64"\n",
> +                   e->start, e->length, e->depth, e->offset);
> +        }
> +        /* This format ignores the distinction between 0, ZERO and ZERO|DATA.
> +         * Modify the flags here to allow more coalescing.
> +         */
> +        if (next &&
> +            (next->flags & (BDRV_BLOCK_DATA|BDRV_BLOCK_ZERO)) != BDRV_BLOCK_DATA) {
> +            next->flags &= ~BDRV_BLOCK_DATA;
> +            next->flags |= BDRV_BLOCK_ZERO;
> +        }
> +        break;
> +    case OFORMAT_JSON:
> +        printf("%s{ 'start': %"PRId64", 'length': %"PRId64", 'depth': %d, "
> +               "'zero': %s, 'data': %s",
> +               (e->start == 0 ? "[" : ",\n"),
> +               e->start, e->length, e->depth,
> +               (e->flags & BDRV_BLOCK_ZERO) ? "true" : "false",
> +               (e->flags & BDRV_BLOCK_DATA) ? "true" : "false");
> +        if (e->flags & BDRV_BLOCK_OFFSET_VALID) {
> +            printf(", 'offset': %"PRId64"", e->offset);
> +        }
> +        putchar('}');
> +
> +        if (!next) {
> +            printf("]\n");
> +        }
> +        break;
> +    }
> +}
> +
> +static int64_t get_block_status(BlockDriverState *bs, int64_t sector_num,
> +                                int *nb_sectors, int *depth)
> +{
> +    int64_t ret;
> +
> +    /* As an optimization, we could cache the current range of unallocated
> +     * clusters in each file of the chain, and avoid querying the same
> +     * range repeatedly.
> +     */
> +
> +    *depth = 0;
> +    for (;;) {
> +        int orig_nb_sectors = *nb_sectors;
> +
> +        if (bs == NULL) {
> +            return 0;
> +        }
> +        ret = bdrv_get_block_status(bs, sector_num, *nb_sectors, nb_sectors);
> +        if (ret < 0) {
> +            return ret;
> +        }
> +
> +        if (ret & (BDRV_BLOCK_ZERO|BDRV_BLOCK_DATA)) {
> +            return ret;
> +        }
> +        if (!*nb_sectors) {
> +            /* Beyond the end of this image (while presumably within the
> +             * range of the BlockDriverState above this one).  The extra
> +             * data is read as zeroes.
> +             */
> +            *nb_sectors = orig_nb_sectors;
> +            return BDRV_BLOCK_ZERO;
> +        }
> +
> +        bs = bs->backing_hd;
> +        (*depth)++;
> +    }
> +}
> +
> +static int img_map(int argc, char **argv)
> +{
> +    int c;
> +    OutputFormat output_format = OFORMAT_HUMAN;
> +    BlockDriverState *bs;
> +    const char *filename, *fmt, *output;
> +    int64_t length;
> +    MapEntry curr = { .length = 0 }, next;
> +
> +    fmt = NULL;
> +    output = NULL;
> +    for(;;) {
> +        int option_index = 0;
> +        static const struct option long_options[] = {
> +            {"help", no_argument, 0, 'h'},
> +            {"format", required_argument, 0, 'f'},
> +            {"output", required_argument, 0, OPTION_OUTPUT},
> +            {0, 0, 0, 0}
> +        };
> +        c = getopt_long(argc, argv, "f:h",
> +                        long_options, &option_index);
> +        if (c == -1) {
> +            break;
> +        }
> +        switch(c) {
> +        case '?':
> +        case 'h':
> +            help();
> +            break;
> +        case 'f':
> +            fmt = optarg;
> +            break;
> +        case OPTION_OUTPUT:
> +            output = optarg;
> +            break;
> +        }
> +    }
> +    if (optind >= argc) {
> +        help();
> +    }
> +    filename = argv[optind++];
> +
> +    if (output && !strcmp(output, "json")) {
> +        output_format = OFORMAT_JSON;
> +    } else if (output && !strcmp(output, "human")) {
> +        output_format = OFORMAT_HUMAN;
> +    } else if (output) {
> +        error_report("--output must be used with human or json as argument.");
> +        return 1;
> +    }
> +
> +    bs = bdrv_new_open(filename, fmt, BDRV_O_FLAGS, true, false);
> +    if (!bs) {
> +        return 1;
> +    }
> +
> +    length = bdrv_getlength(bs);
> +    while (curr.start + curr.length < length) {
> +        int64_t nsectors_left, ret;
> +        int64_t sector_num;
> +        int n, depth, flags;
> +
> +        /* Probe up to 1 G at a time.  */
> +        sector_num = (curr.start + curr.length) >> BDRV_SECTOR_BITS;
> +        nsectors_left = DIV_ROUND_UP(length, BDRV_SECTOR_SIZE) - sector_num;
> +        n = MAX(1 << (30 - BDRV_SECTOR_BITS), nsectors_left);
> +        ret = get_block_status(bs, sector_num, &n, &depth);

Am I misunderstanding here, n is at least 1G, but in your comment it's
up to 1 G at a time?

> +
> +        if (ret < 0) {
> +            error_report("Could not read file metadata: %s", strerror(-ret));
> +            return 1;
> +        }
> +        if ((ret & BDRV_BLOCK_DATA) &&
> +            !(ret & BDRV_BLOCK_OFFSET_VALID)) {
> +            error_report("File contains external, encrypted or compressed clusters.");
> +            return 1;
> +        }
> +
> +        flags = ret & ~BDRV_BLOCK_OFFSET_MASK;
> +        ret &= BDRV_BLOCK_OFFSET_MASK;
> +        if (curr.length == 0 || curr.flags != flags || curr.depth != depth ||
> +            ((curr.flags & BDRV_BLOCK_OFFSET_VALID) &&
> +             curr.offset + curr.length != ret)) {
> +            next.flags = flags;
> +            next.depth = depth;
> +            next.start = sector_num << BDRV_SECTOR_BITS;
> +            next.offset = ret;
> +            next.length = 0;
> +            if (curr.length > 0) {
> +                dump_map_entry(output_format, &curr, &next);
> +            }
> +            curr = next;
> +        }
> +        curr.length += n << BDRV_SECTOR_BITS;
> +    }
> +
> +    dump_map_entry(output_format, &curr, NULL);
> +    return 0;
> +}
> +
>  #define SNAPSHOT_LIST   1
>  #define SNAPSHOT_CREATE 2
>  #define SNAPSHOT_APPLY  3
> -- 
> 1.8.3.1
> 
> 

-- 
Fam

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata
  2013-07-16 16:29 [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata Paolo Bonzini
                   ` (17 preceding siblings ...)
  2013-07-16 16:56 ` [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata Eric Blake
@ 2013-07-18  8:17 ` Peter Lieven
  2013-07-18  8:58   ` Paolo Bonzini
  2013-07-19  4:45 ` Stefan Hajnoczi
  2013-07-19  7:34 ` Stefan Hajnoczi
  20 siblings, 1 reply; 64+ messages in thread
From: Peter Lieven @ 2013-07-18  8:17 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, qemu-devel, stefanha

On 16.07.2013 18:29, Paolo Bonzini wrote:
> This series adds a subcommand to "map" that can dump file metadata.
> Metadata that is dumped includes:
>
> - whether blocks are allocated in bs->file and, if so, where
>
> - whether blocks are zero
>
> - whether data is read from bs or bs->backing_hd
>
> Metadata is dumped for an entire chain of images.  One example usage is
> (for non-compressed, non-encrypted images) to transform the metadata
> into a Linux device-mapper setup, and make a qcow2 image available (for
> read only) as a block device.  Another possible usage is to determine
> the used areas of a file, and convert it in place to another format.
> Alternatively, the richer data can be used by block jobs to quickly
> determine if a block is zero without reading it.
>
> The patches implement a new operation, bdrv_co_get_block_status, that
> supersedes bdrv_co_is_allocated.  The bdrv_is_allocated API is still
> available of course, and implemented on top of the new operation.
>
> Patches 1-3 touch cow, which uses bdrv_co_is_allocated during its reads
> and writes.  I optimized it a bit because it was unbearably slow during
> testing.  With these patches (also tested with blkverify), booting of
> a cow guest image is not particularly slow.
>
> Patches 4 to 6 clean up the bdrv_is_allocated and bdrv_is_allocated_above
> implementation, eliminating some code duplication.
>
> Patches 7 and 8 tweak bdrv_has_zero_init and its usage in qemu-img in
> a way that helps when implementing the new API.
>
> Patches 9 to 11 implement bdrv_get_block_status and change the formats
> to report all the available information.
>
> Patch 12 adds the "map" subcommand to qemu-img.
>
> Finally, patches 13 to 17 tweak the implementation to extract more
> information from protocols, and combine this information with format
> metadata whenever possible.

maybe i have missed it in your patches, but would it be also possible to issue
DISCARDZEROES ioctl on linux to return zero status for block devices also?

Peter

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 12/17] qemu-img: add a "map" subcommand
  2013-07-18  7:25   ` Fam Zheng
@ 2013-07-18  8:55     ` Paolo Bonzini
  0 siblings, 0 replies; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-18  8:55 UTC (permalink / raw)
  To: famz; +Cc: pl, qemu-devel, stefanha

Il 18/07/2013 09:25, Fam Zheng ha scritto:
>> > +        /* Probe up to 1 G at a time.  */
>> > +        sector_num = (curr.start + curr.length) >> BDRV_SECTOR_BITS;
>> > +        nsectors_left = DIV_ROUND_UP(length, BDRV_SECTOR_SIZE) - sector_num;
>> > +        n = MAX(1 << (30 - BDRV_SECTOR_BITS), nsectors_left);
>> > +        ret = get_block_status(bs, sector_num, &n, &depth);
> Am I misunderstanding here, n is at least 1G, but in your comment it's
> up to 1 G at a time?
> 

You're right, it should be MIN.

Paolo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata
  2013-07-18  8:17 ` Peter Lieven
@ 2013-07-18  8:58   ` Paolo Bonzini
  0 siblings, 0 replies; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-18  8:58 UTC (permalink / raw)
  To: Peter Lieven; +Cc: famz, qemu-devel, stefanha

Il 18/07/2013 10:17, Peter Lieven ha scritto:
> 
> maybe i have missed it in your patches, but would it be also possible to issue
> DISCARDZEROES ioctl on linux to return zero status for block devices also?

I think that would belong in your patches.  Once we have discard_zeroes,
we can use it as you suggested.

It is a good occasion to ask Linux developers for clarifications about
the meaning of discard_zeroes in the presence of misaligned requests.
But because of legacy, we might need to query the kernel for the
granularity ourselves---which is ugly because it is not available
through a ioctl, only sysfs.

Paolo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata
  2013-07-16 16:29 [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata Paolo Bonzini
                   ` (18 preceding siblings ...)
  2013-07-18  8:17 ` Peter Lieven
@ 2013-07-19  4:45 ` Stefan Hajnoczi
  2013-07-19  7:34 ` Stefan Hajnoczi
  20 siblings, 0 replies; 64+ messages in thread
From: Stefan Hajnoczi @ 2013-07-19  4:45 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, pl, qemu-devel, stefanha

On Tue, Jul 16, 2013 at 06:29:11PM +0200, Paolo Bonzini wrote:
> This series adds a subcommand to "map" that can dump file metadata.
> Metadata that is dumped includes:
> 
> - whether blocks are allocated in bs->file and, if so, where
> 
> - whether blocks are zero
> 
> - whether data is read from bs or bs->backing_hd
> 
> Metadata is dumped for an entire chain of images.  One example usage is
> (for non-compressed, non-encrypted images) to transform the metadata
> into a Linux device-mapper setup, and make a qcow2 image available (for
> read only) as a block device.  Another possible usage is to determine
> the used areas of a file, and convert it in place to another format.
> Alternatively, the richer data can be used by block jobs to quickly
> determine if a block is zero without reading it.
> 
> The patches implement a new operation, bdrv_co_get_block_status, that
> supersedes bdrv_co_is_allocated.  The bdrv_is_allocated API is still
> available of course, and implemented on top of the new operation.
> 
> Patches 1-3 touch cow, which uses bdrv_co_is_allocated during its reads
> and writes.  I optimized it a bit because it was unbearably slow during
> testing.  With these patches (also tested with blkverify), booting of
> a cow guest image is not particularly slow.
> 
> Patches 4 to 6 clean up the bdrv_is_allocated and bdrv_is_allocated_above
> implementation, eliminating some code duplication.
> 
> Patches 7 and 8 tweak bdrv_has_zero_init and its usage in qemu-img in
> a way that helps when implementing the new API.
> 
> Patches 9 to 11 implement bdrv_get_block_status and change the formats
> to report all the available information.
> 
> Patch 12 adds the "map" subcommand to qemu-img.
> 
> Finally, patches 13 to 17 tweak the implementation to extract more
> information from protocols, and combine this information with format
> metadata whenever possible.
> 
> Paolo
> 
> v1->v2: see patches 1, 6, 9, 10, 11, 12
> 
> Paolo Bonzini (17):
>   cow: make reads go at a decent speed
>   cow: make writes go at a less indecent speed
>   cow: do not call bdrv_co_is_allocated
>   block: make bdrv_co_is_allocated static
>   block: remove bdrv_is_allocated_above/bdrv_co_is_allocated_above
>     distinction
>   block: expect errors from bdrv_co_is_allocated
>   qemu-img: always probe the input image for allocated sectors
>   block: make bdrv_has_zero_init return false for copy-on-write-images
>   block: introduce bdrv_get_block_status API
>   block: define get_block_status return value
>   block: return get_block_status data and flags for formats
>   qemu-img: add a "map" subcommand
>   block: use bdrv_has_zero_init to return BDRV_BLOCK_ZERO
>   raw-posix: return get_block_status data and flags
>   raw-posix: detect XFS unwritten extents
>   block: add default get_block_status implementation for protocols
>   block: look for zero blocks in bs->file
> 
>  block.c                   | 134 +++++++++++++--------------
>  block/commit.c            |   6 +-
>  block/cow.c               |  90 +++++++++++++------
>  block/mirror.c            |   4 +-
>  block/qcow.c              |  13 ++-
>  block/qcow2.c             |  24 +++--
>  block/qed.c               |  39 ++++++--
>  block/raw-posix.c         |  24 +++--
>  block/raw.c               |   6 +-
>  block/sheepdog.c          |  14 +--
>  block/stream.c            |  10 +--
>  block/vdi.c               |  17 +++-
>  block/vmdk.c              |  23 ++++-
>  block/vvfat.c             |  15 ++--
>  include/block/block.h     |  34 +++++--
>  include/block/block_int.h |   2 +-
>  qemu-img-cmds.hx          |   6 ++
>  qemu-img.c                | 225 +++++++++++++++++++++++++++++++++++++++++-----
>  18 files changed, 503 insertions(+), 183 deletions(-)

Applied s/MAX/MIN/ fixup suggested by Fam.

Thanks, applied to my block tree:
https://github.com/stefanha/qemu/commits/block

Stefan

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/17] block: define get_block_status return value
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 10/17] block: define get_block_status return value Paolo Bonzini
@ 2013-07-19  6:35   ` Peter Lieven
  2013-07-19  9:14     ` Paolo Bonzini
  2013-07-19  6:48   ` Peter Lieven
  2013-07-19 16:52   ` Eric Blake
  2 siblings, 1 reply; 64+ messages in thread
From: Peter Lieven @ 2013-07-19  6:35 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, qemu-devel, stefanha

On 16.07.2013 18:29, Paolo Bonzini wrote:
> Define the return value of get_block_status.  Bits 0, 1, 2 and 9-62
> are valid; bit 63 (the sign bit) is reserved for errors.  Bits 3-7
> are left for future extensions.
>
> The return code is compatible with the old is_allocated API: returning
> just 0 or 1 (aka BDRV_BLOCK_DATA) will not cause any behavioral change
> in clients of is_allocated.  We will return more precise information
> in the next patches.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>          v1->v2: improved comment
>
>   block.c               |  7 +++++--
>   include/block/block.h | 26 ++++++++++++++++++++++++++
>   2 files changed, 31 insertions(+), 2 deletions(-)
>
> diff --git a/block.c b/block.c
> index 6e7a8a3..7ff0716 100644
> --- a/block.c
> +++ b/block.c
> @@ -2990,7 +2990,7 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
>   
>       if (!bs->drv->bdrv_co_get_block_status) {
>           *pnum = nb_sectors;
> -        return 1;
> +        return BDRV_BLOCK_DATA;
>       }
>   
>       return bs->drv->bdrv_co_get_block_status(bs, sector_num, nb_sectors, pnum);
> @@ -3040,7 +3040,10 @@ int64_t bdrv_get_block_status(BlockDriverState *bs, int64_t sector_num,
>   int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num,
>                                      int nb_sectors, int *pnum)
>   {
> -    return bdrv_get_block_status(bs, sector_num, nb_sectors, pnum);
> +    int64_t ret = bdrv_get_block_status(bs, sector_num, nb_sectors, pnum);
just stumbled about a question i have:

isn't here a

if (ret < 0) {
  *pnum = 0;
  return 0;
}

missing?

I was told that this is the expected return behaviour in the
error case of the old API.

Peter

> +    return
> +        (ret & BDRV_BLOCK_DATA) ||
> +        ((ret & BDRV_BLOCK_ZERO) && !bdrv_has_zero_init(bs));
>   }
>   
>   /*
> diff --git a/include/block/block.h b/include/block/block.h
> index fbc0822..83c7112 100644
> --- a/include/block/block.h
> +++ b/include/block/block.h
> @@ -81,6 +81,32 @@ typedef struct BlockDevOps {
>   #define BDRV_SECTOR_SIZE   (1ULL << BDRV_SECTOR_BITS)
>   #define BDRV_SECTOR_MASK   ~(BDRV_SECTOR_SIZE - 1)
>   
> +/* BDRV_BLOCK_DATA: data is read from bs->file or another file
> + * BDRV_BLOCK_ZERO: sectors read as zero
> + * BDRV_BLOCK_OFFSET_VALID: sector stored in bs->file as raw data
> + *
> + * If BDRV_BLOCK_OFFSET_VALID is set, bits 9-62 represent the offset in
> + * bs->file where sector data can be read from as raw data.
> + *
> + * DATA == 0 && ZERO == 0 means that data is read from backing_hd if present.
> + *
> + * DATA ZERO OFFSET_VALID
> + *  t    t        t       sectors read as zero, bs->file is zero at offset
> + *  t    f        t       sectors read as valid from bs->file at offset
> + *  f    t        t       sectors preallocated, read as zero, bs->file not
> + *                        necessarily zero at offset
> + *  f    f        t       sectors preallocated but read from backing_hd,
> + *                        bs->file contains garbage at offset
> + *  t    t        f       sectors preallocated, read as zero, unknown offset
> + *  t    f        f       sectors read from unknown file or offset
> + *  f    t        f       not allocated or unknown offset, read as zero
> + *  f    f        f       not allocated or unknown offset, read from backing_hd
> + */
> +#define BDRV_BLOCK_DATA         1
> +#define BDRV_BLOCK_ZERO         2
> +#define BDRV_BLOCK_OFFSET_VALID 4
> +#define BDRV_BLOCK_OFFSET_MASK  BDRV_SECTOR_MASK
> +
>   typedef enum {
>       BDRV_ACTION_REPORT, BDRV_ACTION_IGNORE, BDRV_ACTION_STOP
>   } BlockErrorAction;


-- 

Mit freundlichen Grüßen

Peter Lieven

...........................................................

   KAMP Netzwerkdienste GmbH
   Vestische Str. 89-91 | 46117 Oberhausen
   Tel: +49 (0) 208.89 402-50 | Fax: +49 (0) 208.89 402-40
   pl@kamp.de | http://www.kamp.de

   Geschäftsführer: Heiner Lante | Michael Lante
   Amtsgericht Duisburg | HRB Nr. 12154
   USt-Id-Nr.: DE 120607556

...........................................................

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/17] block: define get_block_status return value
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 10/17] block: define get_block_status return value Paolo Bonzini
  2013-07-19  6:35   ` Peter Lieven
@ 2013-07-19  6:48   ` Peter Lieven
  2013-07-19  9:12     ` Paolo Bonzini
  2013-07-19  9:58     ` Paolo Bonzini
  2013-07-19 16:52   ` Eric Blake
  2 siblings, 2 replies; 64+ messages in thread
From: Peter Lieven @ 2013-07-19  6:48 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, qemu-devel, stefanha

On 16.07.2013 18:29, Paolo Bonzini wrote:
> Define the return value of get_block_status.  Bits 0, 1, 2 and 9-62
> are valid; bit 63 (the sign bit) is reserved for errors.  Bits 3-7
> are left for future extensions.
>
> The return code is compatible with the old is_allocated API: returning
> just 0 or 1 (aka BDRV_BLOCK_DATA) will not cause any behavioral change
> in clients of is_allocated.  We will return more precise information
> in the next patches.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>          v1->v2: improved comment
>
>   block.c               |  7 +++++--
>   include/block/block.h | 26 ++++++++++++++++++++++++++
>   2 files changed, 31 insertions(+), 2 deletions(-)
>
> diff --git a/block.c b/block.c
> index 6e7a8a3..7ff0716 100644
> --- a/block.c
> +++ b/block.c
> @@ -2990,7 +2990,7 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
>   
>       if (!bs->drv->bdrv_co_get_block_status) {
>           *pnum = nb_sectors;
> -        return 1;
> +        return BDRV_BLOCK_DATA;
>       }
>   
>       return bs->drv->bdrv_co_get_block_status(bs, sector_num, nb_sectors, pnum);
> @@ -3040,7 +3040,10 @@ int64_t bdrv_get_block_status(BlockDriverState *bs, int64_t sector_num,
>   int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num,
>                                      int nb_sectors, int *pnum)
>   {
> -    return bdrv_get_block_status(bs, sector_num, nb_sectors, pnum);
> +    int64_t ret = bdrv_get_block_status(bs, sector_num, nb_sectors, pnum);
> +    return
> +        (ret & BDRV_BLOCK_DATA) ||
> +        ((ret & BDRV_BLOCK_ZERO) && !bdrv_has_zero_init(bs));

i do also not understand the "((ret & BDRV_BLOCK_ZERO) && !bdrv_has_zero_init(bs))";
if a block is unallocated and reads as zero, but the device lacks zero init, it
is declared as allocated with this, isn't it?

for iscsi and host_device with lbprz==1 or discardzeroes respectively all
blocks would return as allocated. is this wanted?

Peter

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 17/17] block: look for zero blocks in bs->file
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 17/17] block: look for zero blocks in bs->file Paolo Bonzini
@ 2013-07-19  7:33   ` Stefan Hajnoczi
  2013-07-19 17:07     ` Paolo Bonzini
  2013-07-19 20:12   ` Eric Blake
  1 sibling, 1 reply; 64+ messages in thread
From: Stefan Hajnoczi @ 2013-07-19  7:33 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, pl, qemu-devel, stefanha

On Tue, Jul 16, 2013 at 06:29:28PM +0200, Paolo Bonzini wrote:
> diff --git a/block.c b/block.c
> index 557ce29..2d7d71f 100644
> --- a/block.c
> +++ b/block.c
> @@ -2977,7 +2977,7 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
>                                                       int nb_sectors, int *pnum)
>  {
>      int64_t n;
> -    int64_t ret;
> +    int64_t ret, ret2;
>  
>      if (sector_num >= bs->total_sectors) {
>          *pnum = 0;
> @@ -3003,6 +3003,14 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
>          ret |= BDRV_BLOCK_ZERO;
>      }
>  
> +    if (bs->file &&
> +        (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
> +        (ret & BDRV_BLOCK_OFFSET_VALID)) {
> +        ret2 = bdrv_co_get_block_status(bs->file, ret >> BDRV_SECTOR_BITS,
> +                                        *pnum, pnum);
> +        ret |= (ret2 & BDRV_BLOCK_ZERO);
> +    }

This patch breaks qemu-iotests 030 (image streaming).

The problem is that bdrv_co_get_block_status() uses bs->total_sectors
directly instead of calling bdv_get_geometry()/bdrv_getlength().

With qcow2 the bs->file can grow on disk.  We don't update
bs->total_sectors.

Then this patch calls bdrv_co_get_block_status(bs->file) where we fail
with *pnum = 0, ret = 0 because bs->total_sectors suggests it is beyond
the end of the file.

The result is that 030 goes into an infinite loop.

As a quick test I switched the direct bs->total_sectors accesses to
bdrv_get_geometry() and it stopped hanging.  Perhaps the
bs->total_sectors caching needs to be improved though.

Stefan

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata
  2013-07-16 16:29 [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata Paolo Bonzini
                   ` (19 preceding siblings ...)
  2013-07-19  4:45 ` Stefan Hajnoczi
@ 2013-07-19  7:34 ` Stefan Hajnoczi
  20 siblings, 0 replies; 64+ messages in thread
From: Stefan Hajnoczi @ 2013-07-19  7:34 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, pl, qemu-devel, stefanha

On Tue, Jul 16, 2013 at 06:29:11PM +0200, Paolo Bonzini wrote:
> This series adds a subcommand to "map" that can dump file metadata.
> Metadata that is dumped includes:
> 
> - whether blocks are allocated in bs->file and, if so, where
> 
> - whether blocks are zero
> 
> - whether data is read from bs or bs->backing_hd

I dropped this series from my block pull request again and left a
comment on the final patch, which breaks qemu-iotests 030.

Stefan

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/17] block: define get_block_status return value
  2013-07-19  6:48   ` Peter Lieven
@ 2013-07-19  9:12     ` Paolo Bonzini
  2013-07-19  9:58     ` Paolo Bonzini
  1 sibling, 0 replies; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-19  9:12 UTC (permalink / raw)
  To: Peter Lieven; +Cc: famz, qemu-devel, stefanha

Il 19/07/2013 08:48, Peter Lieven ha scritto:
>> @@ -3040,7 +3040,10 @@ int64_t bdrv_get_block_status(BlockDriverState
>> *bs, int64_t sector_num,
>>   int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t
>> sector_num,
>>                                      int nb_sectors, int *pnum)
>>   {
>> -    return bdrv_get_block_status(bs, sector_num, nb_sectors, pnum);
>> +    int64_t ret = bdrv_get_block_status(bs, sector_num, nb_sectors,
>> pnum);
>> +    return
>> +        (ret & BDRV_BLOCK_DATA) ||
>> +        ((ret & BDRV_BLOCK_ZERO) && !bdrv_has_zero_init(bs));
> 
> i do also not understand the "((ret & BDRV_BLOCK_ZERO) &&
> !bdrv_has_zero_init(bs))";
> if a block is unallocated and reads as zero, but the device lacks zero
> init, it is declared as allocated with this, isn't it?

Perhaps bdrv_has_zero_init(bs) could be replaced by bs->backing_hd.  I'd
have to look at the code more closely.

But I suggest that you try using get_block_status (the function in
qemu-img.c) instead of bdrv_is_allocated in qemu-img.c.  It will
probably simplify your code _and_ make it more efficient.

> for iscsi and host_device with lbprz==1 or discardzeroes respectively all
> blocks would return as allocated. is this wanted?

Paolo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/17] block: define get_block_status return value
  2013-07-19  6:35   ` Peter Lieven
@ 2013-07-19  9:14     ` Paolo Bonzini
  0 siblings, 0 replies; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-19  9:14 UTC (permalink / raw)
  To: Peter Lieven; +Cc: famz, qemu-devel, stefanha

Il 19/07/2013 08:35, Peter Lieven ha scritto:
> On 16.07.2013 18:29, Paolo Bonzini wrote:
>> Define the return value of get_block_status.  Bits 0, 1, 2 and 9-62
>> are valid; bit 63 (the sign bit) is reserved for errors.  Bits 3-7
>> are left for future extensions.
>>
>> The return code is compatible with the old is_allocated API: returning
>> just 0 or 1 (aka BDRV_BLOCK_DATA) will not cause any behavioral change
>> in clients of is_allocated.  We will return more precise information
>> in the next patches.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>          v1->v2: improved comment
>>
>>   block.c               |  7 +++++--
>>   include/block/block.h | 26 ++++++++++++++++++++++++++
>>   2 files changed, 31 insertions(+), 2 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index 6e7a8a3..7ff0716 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -2990,7 +2990,7 @@ static int64_t coroutine_fn
>> bdrv_co_get_block_status(BlockDriverState *bs,
>>         if (!bs->drv->bdrv_co_get_block_status) {
>>           *pnum = nb_sectors;
>> -        return 1;
>> +        return BDRV_BLOCK_DATA;
>>       }
>>         return bs->drv->bdrv_co_get_block_status(bs, sector_num,
>> nb_sectors, pnum);
>> @@ -3040,7 +3040,10 @@ int64_t bdrv_get_block_status(BlockDriverState
>> *bs, int64_t sector_num,
>>   int coroutine_fn bdrv_is_allocated(BlockDriverState *bs, int64_t
>> sector_num,
>>                                      int nb_sectors, int *pnum)
>>   {
>> -    return bdrv_get_block_status(bs, sector_num, nb_sectors, pnum);
>> +    int64_t ret = bdrv_get_block_status(bs, sector_num, nb_sectors,
>> pnum);
> just stumbled about a question i have:
> 
> isn't here a
> 
> if (ret < 0) {
>  *pnum = 0;
>  return 0;
> }
> 
> missing?
> 
> I was told that this is the expected return behaviour in the
> error case of the old API.

Earlier in the series this is made more reasonable (is_allocated can
return errors), but indeed a

    if (ret < 0) {
        return ret;
    }

is missing.  I'll send a follow-up.

Paolo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/17] block: define get_block_status return value
  2013-07-19  6:48   ` Peter Lieven
  2013-07-19  9:12     ` Paolo Bonzini
@ 2013-07-19  9:58     ` Paolo Bonzini
  2013-07-19 10:04       ` Peter Lieven
  1 sibling, 1 reply; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-19  9:58 UTC (permalink / raw)
  To: Peter Lieven; +Cc: famz, qemu-devel, stefanha

Il 19/07/2013 08:48, Peter Lieven ha scritto:
>>
>> -    return bdrv_get_block_status(bs, sector_num, nb_sectors, pnum);
>> +    int64_t ret = bdrv_get_block_status(bs, sector_num, nb_sectors,
>> pnum);
>> +    return
>> +        (ret & BDRV_BLOCK_DATA) ||
>> +        ((ret & BDRV_BLOCK_ZERO) && !bdrv_has_zero_init(bs));
> 
> i do also not understand the "((ret & BDRV_BLOCK_ZERO) &&
> !bdrv_has_zero_init(bs))";
> if a block is unallocated and reads as zero, but the device lacks zero
> init, it is declared as allocated with this, isn't it?

Thinking more about it, I would say it is a bugfix.

If a block is unallocated and reads as zero, but the device lacks zero
init, the block contents have changed since the guest was created.  Thus
the guest might well be relying on the zero content of the block, and it
should be treated as allocated.

In any case, better safe than sorry---if in doubt, the conservative
answer is always to return 1, and callers who want more precise
information can use bdrv_get_block_status.

Paolo
> for iscsi and host_device with lbprz==1 or discardzeroes respectively all
> blocks would return as allocated. is this wanted?

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/17] block: define get_block_status return value
  2013-07-19  9:58     ` Paolo Bonzini
@ 2013-07-19 10:04       ` Peter Lieven
  2013-07-19 12:13         ` Paolo Bonzini
  0 siblings, 1 reply; 64+ messages in thread
From: Peter Lieven @ 2013-07-19 10:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, qemu-devel, stefanha

On 19.07.2013 11:58, Paolo Bonzini wrote:
> Il 19/07/2013 08:48, Peter Lieven ha scritto:
>>> -    return bdrv_get_block_status(bs, sector_num, nb_sectors, pnum);
>>> +    int64_t ret = bdrv_get_block_status(bs, sector_num, nb_sectors,
>>> pnum);
>>> +    return
>>> +        (ret & BDRV_BLOCK_DATA) ||
>>> +        ((ret & BDRV_BLOCK_ZERO) && !bdrv_has_zero_init(bs));
>> i do also not understand the "((ret & BDRV_BLOCK_ZERO) &&
>> !bdrv_has_zero_init(bs))";
>> if a block is unallocated and reads as zero, but the device lacks zero
>> init, it is declared as allocated with this, isn't it?
> Thinking more about it, I would say it is a bugfix.
>
> If a block is unallocated and reads as zero, but the device lacks zero
> init, the block contents have changed since the guest was created.  Thus
> the guest might well be relying on the zero content of the block, and it
> should be treated as allocated.
i was told that has_zero_init sole task is to report the state
of the device right after iscsi_create(). using it for r/w in qemu
might be a misuse.

>
> In any case, better safe than sorry---if in doubt, the conservative
> answer is always to return 1, and callers who want more precise
> information can use bdrv_get_block_status.
you mean for every call to the deprecated bdrv_is_allocate()?

Peter

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata
  2013-07-16 16:57   ` Paolo Bonzini
@ 2013-07-19 10:48     ` Wenchao Xia
  2013-07-19 12:14       ` Paolo Bonzini
  0 siblings, 1 reply; 64+ messages in thread
From: Wenchao Xia @ 2013-07-19 10:48 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: pl, famz, qemu-devel, stefanha

于 2013-7-17 0:57, Paolo Bonzini 写道:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Il 16/07/2013 18:56, Eric Blake ha scritto:
>> On 07/16/2013 10:29 AM, Paolo Bonzini wrote:
>>> This series adds a subcommand to "map" that can dump file
>>> metadata. Metadata that is dumped includes:
>>>
>>> - whether blocks are allocated in bs->file and, if so, where
>>>
>>> - whether blocks are zero
>>>
>>> - whether data is read from bs or bs->backing_hd
>>
>> Is this command only usable from qemu-img, or is there a QMP
>> counterpart for getting at the same information on a live image
>> while qemu is running?
>
> Not yet, we had no use case yet.  It may be exposed later to the
> guests via the SCSI command GET LBA STATUS.
>
> Paolo
>
   I think dump of allocated info in qemu-img in this series, can do
dirty change tracking job for backing chain snapshot now, qemu's QMP
interface is not very needed.
   Only thing not perfect, is that it talks with string. A library would
be better, but I am not selling libqblock:), instead I'd like to form
a better and clean library so want to adjust qemu's block layer first.

>>>
>>> Metadata is dumped for an entire chain of images.  One example
>>> usage is (for non-compressed, non-encrypted images) to transform
>>> the metadata into a Linux device-mapper setup, and make a qcow2
>>> image available (for read only) as a block device.  Another
>>> possible usage is to determine the used areas of a file, and
>>> convert it in place to another format. Alternatively, the richer
>>> data can be used by block jobs to quickly determine if a block is
>>> zero without reading it.
>>
>> Definitely useful information!  I haven't looked at the
>> implementation yet, but there had better be a JSON representation
>> of the output :)
>>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v2.0.19 (GNU/Linux)
> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>
> iQIcBAEBAgAGBQJR5XtfAAoJEBvWZb6bTYbypRUP/jadsxWgkWGbefpIqUL7Iykt
> EiPeQfuwJNrq2uXU5orD11xkJbm5gElg/iLQBfBnpAZTJ9VNKTkGUwwhAPgLfIrT
> LkTwd7Puqbe8q965K7kXqDoxT5fN+5nyyE9TbvlOEkThLRebWGVEgfViKoqnPYFW
> ApejY05zmWrT9+wk984/ELNH0XAv54vITGGzrkgFKpt+h+MZ/1ZQfy3xapczBPTh
> ui+U9Y2zJ3vdE3JxYGxNM0IVAWvrn0prCRpgGF+Id1uaT839UuYU1ukEFxJpynrh
> iXHuryQgI2zNx4IDwSwIXfOLc1P0XR59eOaN+n+mw45faIOEaEOlChSC4SvDCtTT
> 3B1bK7ZqTFy8P8KQ+SjA4NIkl8yb7qLmEGmweJnSxSI1JUz/P3k+uC/FZd/Zxe/r
> nFVdXYQtKVCZW0CMBMdHRE2T/3PFlx2nz/dzW+OrKyweKpDqcYjU+TyyrvXiT5Yv
> E8ttpOzQsNJl+VdEWpfa4StgRgQxmazVM80OyE52cNKAixUyLpJ7pR4fsF1mcswc
> BWdpSe/J8T6LRyTOAKoWoi2w8Mw/DMey4neQ1+py8tWuCqJix3NtyduVtsvrRiYd
> aPcXvGfAvDbzEcMHnwz0SVpTqleK/7H7pYScBdWYnmUG5wJuxfhJHD/5JDQ1wmj0
> jGHmCAddWW+8RrHZGrpX
> =paUV
> -----END PGP SIGNATURE-----
>


-- 
Best Regards

Wenchao Xia

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/17] block: define get_block_status return value
  2013-07-19 10:04       ` Peter Lieven
@ 2013-07-19 12:13         ` Paolo Bonzini
  2013-07-19 13:06           ` Peter Lieven
  0 siblings, 1 reply; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-19 12:13 UTC (permalink / raw)
  To: Peter Lieven; +Cc: famz, qemu-devel, stefanha

Il 19/07/2013 12:04, Peter Lieven ha scritto:
> On 19.07.2013 11:58, Paolo Bonzini wrote:
>> Il 19/07/2013 08:48, Peter Lieven ha scritto:
>>>> -    return bdrv_get_block_status(bs, sector_num, nb_sectors, pnum);
>>>> +    int64_t ret = bdrv_get_block_status(bs, sector_num, nb_sectors,
>>>> pnum);
>>>> +    return
>>>> +        (ret & BDRV_BLOCK_DATA) ||
>>>> +        ((ret & BDRV_BLOCK_ZERO) && !bdrv_has_zero_init(bs));
>>> i do also not understand the "((ret & BDRV_BLOCK_ZERO) &&
>>> !bdrv_has_zero_init(bs))";
>>> if a block is unallocated and reads as zero, but the device lacks zero
>>> init, it is declared as allocated with this, isn't it?
>> Thinking more about it, I would say it is a bugfix.
>>
>> If a block is unallocated and reads as zero, but the device lacks zero
>> init, the block contents have changed since the guest was created.  Thus
>> the guest might well be relying on the zero content of the block, and it
>> should be treated as allocated.
> i was told that has_zero_init sole task is to report the state
> of the device right after iscsi_create(). using it for r/w in qemu
> might be a misuse.

Yes, and here I'm using it exactly for that task.  I'm saying "treat a
zero block as allocated if it wasn't zero right after _create()".

>> In any case, better safe than sorry---if in doubt, the conservative
>> answer is always to return 1, and callers who want more precise
>> information can use bdrv_get_block_status.
> 
> you mean for every call to the deprecated bdrv_is_allocate()?

Of course I'm not advocating returning 1 for every call.  But if the
semantic isn't clear, I prefer to err on the safe side.

Paolo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata
  2013-07-19 10:48     ` Wenchao Xia
@ 2013-07-19 12:14       ` Paolo Bonzini
  2013-07-20  0:19         ` Wenchao Xia
  0 siblings, 1 reply; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-19 12:14 UTC (permalink / raw)
  To: Wenchao Xia; +Cc: pl, famz, qemu-devel, stefanha

Il 19/07/2013 12:48, Wenchao Xia ha scritto:
>   I think dump of allocated info in qemu-img in this series, can do
> dirty change tracking job for backing chain snapshot now, qemu's QMP
> interface is not very needed.
>   Only thing not perfect, is that it talks with string.

It uses JSON, actually.

> A library would be better, but I am not selling libqblock:),

Oh you should!  Are you going back to it?  It was very close to
mergeable state.

Paolo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 01/17] cow: make reads go at a decent speed
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 01/17] cow: make reads go at a decent speed Paolo Bonzini
@ 2013-07-19 12:31   ` Eric Blake
  0 siblings, 0 replies; 64+ messages in thread
From: Eric Blake @ 2013-07-19 12:31 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, pl, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 629 bytes --]

On 07/16/2013 10:29 AM, Paolo Bonzini wrote:
> Do not do two reads for each sector; load each sector of the bitmap
> and use bitmap operations to process it.
> 
> Writes are still dog slow!
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>         v1->v2: use BDRV_SECTOR_SIZE, not 512 for bitmap array length
> 
>  block/cow.c | 54 ++++++++++++++++++++++++++++++++----------------------
>  1 file changed, 32 insertions(+), 22 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 02/17] cow: make writes go at a less indecent speed
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 02/17] cow: make writes go at a less indecent speed Paolo Bonzini
@ 2013-07-19 12:40   ` Eric Blake
  0 siblings, 0 replies; 64+ messages in thread
From: Eric Blake @ 2013-07-19 12:40 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, pl, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 425 bytes --]

On 07/16/2013 10:29 AM, Paolo Bonzini wrote:
> Only sync once per write, rather than once per sector.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block/cow.c | 19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 03/17] cow: do not call bdrv_co_is_allocated
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 03/17] cow: do not call bdrv_co_is_allocated Paolo Bonzini
@ 2013-07-19 12:42   ` Eric Blake
  0 siblings, 0 replies; 64+ messages in thread
From: Eric Blake @ 2013-07-19 12:42 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, pl, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 573 bytes --]

On 07/16/2013 10:29 AM, Paolo Bonzini wrote:
> As we change bdrv_is_allocated to gather more information from bs and
> bs->file, it will become a bit slower.  It is still appropriate for online
> jobs, but not for reads/writes.  Call the internal function instead.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block/cow.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 04/17] block: make bdrv_co_is_allocated static
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 04/17] block: make bdrv_co_is_allocated static Paolo Bonzini
@ 2013-07-19 12:44   ` Eric Blake
  0 siblings, 0 replies; 64+ messages in thread
From: Eric Blake @ 2013-07-19 12:44 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, pl, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 609 bytes --]

On 07/16/2013 10:29 AM, Paolo Bonzini wrote:
> bdrv_is_allocated can detect coroutine context and go through a fast
> path, similar to other block layer functions.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block.c               | 24 +++++++++++++++---------
>  block/raw.c           |  2 +-
>  block/stream.c        |  4 ++--
>  include/block/block.h |  2 --
>  4 files changed, 18 insertions(+), 14 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 05/17] block: remove bdrv_is_allocated_above/bdrv_co_is_allocated_above distinction
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 05/17] block: remove bdrv_is_allocated_above/bdrv_co_is_allocated_above distinction Paolo Bonzini
@ 2013-07-19 12:46   ` Eric Blake
  0 siblings, 0 replies; 64+ messages in thread
From: Eric Blake @ 2013-07-19 12:46 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, pl, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 643 bytes --]

On 07/16/2013 10:29 AM, Paolo Bonzini wrote:
> Now that bdrv_is_allocated detects coroutine context, the two can
> use the same code.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block.c               | 46 ++++------------------------------------------
>  block/commit.c        |  6 +++---
>  block/mirror.c        |  4 ++--
>  block/stream.c        |  4 ++--
>  include/block/block.h |  4 ----
>  5 files changed, 11 insertions(+), 53 deletions(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 06/17] block: expect errors from bdrv_co_is_allocated
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 06/17] block: expect errors from bdrv_co_is_allocated Paolo Bonzini
@ 2013-07-19 12:57   ` Eric Blake
  0 siblings, 0 replies; 64+ messages in thread
From: Eric Blake @ 2013-07-19 12:57 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, pl, qemu-devel, stefanha, qemu-stable

[-- Attachment #1: Type: text/plain, Size: 1183 bytes --]

On 07/16/2013 10:29 AM, Paolo Bonzini wrote:
> Some bdrv_is_allocated callers do not expect errors, but the fallback
> in qcow2.c might make other callers trip on assertion failures or
> infinite loops.
> 
> Fix the callers to always look for errors.
> 
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>         v1->v2: modify error message, add strerror(-ret)

> +++ b/qemu-img.c
> @@ -2073,6 +2073,11 @@ static int img_rebase(int argc, char **argv)
>  
>              /* If the cluster is allocated, we don't need to take action */
>              ret = bdrv_is_allocated(bs, sector, n, &n);
> +            if (ret < 0) {
> +                error_report("error while reading image metadata: %s",
> +                             strerror(-ret));

Hmm, we have error_setg_errno so that callers don't have to use
strerror(); is it time to introduce error_report_errno for the same
convenience factor?  But that's a side question that does not impact
this patch.

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 07/17] qemu-img: always probe the input image for allocated sectors
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 07/17] qemu-img: always probe the input image for allocated sectors Paolo Bonzini
@ 2013-07-19 13:05   ` Eric Blake
  0 siblings, 0 replies; 64+ messages in thread
From: Eric Blake @ 2013-07-19 13:05 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, pl, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 733 bytes --]

On 07/16/2013 10:29 AM, Paolo Bonzini wrote:
> qemu-img convert is assuming "that sectors which are unallocated in the input
> image are present in both the output's and input's base images", but it is
> only doing this if the output image is zero initialized.  And checking if
> the output image is zero initialized does not make much sense if the
> output image is copy-on-write.  Always do the test.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  qemu-img.c | 26 ++++++++++++--------------
>  1 file changed, 12 insertions(+), 14 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/17] block: define get_block_status return value
  2013-07-19 12:13         ` Paolo Bonzini
@ 2013-07-19 13:06           ` Peter Lieven
  2013-07-20  7:00             ` Paolo Bonzini
  0 siblings, 1 reply; 64+ messages in thread
From: Peter Lieven @ 2013-07-19 13:06 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, qemu-devel, stefanha

On 19.07.2013 14:13, Paolo Bonzini wrote:
> Il 19/07/2013 12:04, Peter Lieven ha scritto:
>> On 19.07.2013 11:58, Paolo Bonzini wrote:
>>> Il 19/07/2013 08:48, Peter Lieven ha scritto:
>>>>> -    return bdrv_get_block_status(bs, sector_num, nb_sectors, pnum);
>>>>> +    int64_t ret = bdrv_get_block_status(bs, sector_num, nb_sectors,
>>>>> pnum);
>>>>> +    return
>>>>> +        (ret & BDRV_BLOCK_DATA) ||
>>>>> +        ((ret & BDRV_BLOCK_ZERO) && !bdrv_has_zero_init(bs));
>>>> i do also not understand the "((ret & BDRV_BLOCK_ZERO) &&
>>>> !bdrv_has_zero_init(bs))";
>>>> if a block is unallocated and reads as zero, but the device lacks zero
>>>> init, it is declared as allocated with this, isn't it?
>>> Thinking more about it, I would say it is a bugfix.
>>>
>>> If a block is unallocated and reads as zero, but the device lacks zero
>>> init, the block contents have changed since the guest was created.  Thus
>>> the guest might well be relying on the zero content of the block, and it
>>> should be treated as allocated.
>> i was told that has_zero_init sole task is to report the state
>> of the device right after iscsi_create(). using it for r/w in qemu
>> might be a misuse.
> Yes, and here I'm using it exactly for that task.  I'm saying "treat a
> zero block as allocated if it wasn't zero right after _create()".
If it is zero and allocated the API should return only BDRV_BLOCK_DATA
and if it is zero and unallocated only BDRV_BLOCK_ZERO or not?

What I mean is the new API shouldn't change the behaviour of the old bdrv_is_allocated().
It would have returned

(ret & BDRV_BLOCK_DATA) regardless if BDRV_BLOCK_ZERO or not.


Peter

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/17] block: make bdrv_has_zero_init return false for copy-on-write-images
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 08/17] block: make bdrv_has_zero_init return false for copy-on-write-images Paolo Bonzini
@ 2013-07-19 13:21   ` Eric Blake
  2013-07-19 13:23     ` Paolo Bonzini
  0 siblings, 1 reply; 64+ messages in thread
From: Eric Blake @ 2013-07-19 13:21 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, pl, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 1040 bytes --]

On 07/16/2013 10:29 AM, Paolo Bonzini wrote:
> This helps implementing is_allocated on top of get_block_status.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block.c    | 5 +++++
>  qemu-img.c | 9 +--------
>  2 files changed, 6 insertions(+), 8 deletions(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

> +++ b/block.c
> @@ -2934,6 +2934,11 @@ int bdrv_has_zero_init(BlockDriverState *bs)
>  {
>      assert(bs->drv);
>  
> +    /* If BS is a copy on write image, it is initialized to
> +       the contents of the base image, which may not be zeroes.  */
> +    if (bs->backing_hd) {
> +        return 0;

Question (more for my understanding, not that you need to change code):
must we blindly return 0 in the presence of a backing file, or is it
possible to recursively query the backing_hd's zero_init status,
allowing us to return 1 iff all files in the chain support zero_init.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 08/17] block: make bdrv_has_zero_init return false for copy-on-write-images
  2013-07-19 13:21   ` Eric Blake
@ 2013-07-19 13:23     ` Paolo Bonzini
  0 siblings, 0 replies; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-19 13:23 UTC (permalink / raw)
  To: Eric Blake; +Cc: famz, pl, qemu-devel, stefanha

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Il 19/07/2013 15:21, Eric Blake ha scritto:
> Question (more for my understanding, not that you need to change
> code): must we blindly return 0 in the presence of a backing file,
> or is it possible to recursively query the backing_hd's zero_init
> status, allowing us to return 1 iff all files in the chain support
> zero_init.

No, it's really blind.  bdrv_has_zero_init is like "was the image all
zeros at creation time", and the answer is no because the content of
the image used to be the same as the backing_hd.

Paolo
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJR6T3bAAoJEBvWZb6bTYbyraEP/04nt2HxFsX+na7q/Ax7bGYn
Z0RvyP2zTukfv8dM9aJVjmQEvLRnp7ROQlSOXPbxF3gzr2nh67nb27Xi+vF0B3N0
fDQTnnUX/8Zz2jigolDCSSFp0ggDWRtVID+W/olfIQMAJI1PkRdt0VJQp9aeLGZV
7tae/Fl5AQtfZfAW6Ji4FZzJcmLDyM7HoveZxnAawGy9e17opKGFmKUmBNutew0C
LoKDkakWRrW/k4IzXB0TJOtQ5hOn5yoG16VjYG1YKblewRqaciFba62T+Z4ClJB0
TKJEr3fuGokI9mPoyPCpuZF0ZPEFFI26Mwocag8TpCS5IYJzJjC7RcGDW5CV4zzC
Lzr3fGeTTtEOCJfdF4GtqeSp2S+1QWuagLLr31cV1KpnuKVLItsQ95gDpZMtqjC0
FKLkF7g/BQrE/UgGzJ16YuF4h+Ct8+2Ihiz0BS6OD6ie2g9Ds/u1l/TCn/jfj+ID
PJhDUcDPynBqy+i1V475w+ZY7lLpE7oOg1HPJvQLqFgc6XkPTI75EttpWB01NOoO
vKfugwxKJBfLFCX/oWbTk04P69dysLJLeb/WedTrqWAdn/to0bhD1aKX2rV+aYx3
Rs+7DescTrGV+n6qcdx2TUEqswCyMrBoStBUwEGuM+S+EVFQ7KitAY83puNUO3zX
qAEpAOUWyqE0WXG3s7Ee
=ltDY
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 09/17] block: introduce bdrv_get_block_status API
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 09/17] block: introduce bdrv_get_block_status API Paolo Bonzini
@ 2013-07-19 13:43   ` Eric Blake
  2013-07-25 12:13     ` Paolo Bonzini
  0 siblings, 1 reply; 64+ messages in thread
From: Eric Blake @ 2013-07-19 13:43 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, pl, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 4744 bytes --]

On 07/16/2013 10:29 AM, Paolo Bonzini wrote:
> For now, bdrv_get_block_status is just another name for bdrv_is_allocated.
> The next patches will add more flags.
> 
> This also touches all block drivers with a mostly mechanical rename.  The
> sole exception is cow; because it calls cow_co_is_allocated from the read
> code, we keep that function and make cow_co_get_block_status a wrapper.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>         v1->v2: rebase after vmdk changes

Reviewed-by: Eric Blake <eblake@redhat.com>

> 
> @@ -370,7 +376,7 @@ static BlockDriver bdrv_cow = {
>  
>      .bdrv_read              = cow_co_read,
>      .bdrv_write             = cow_co_write,
> -    .bdrv_co_is_allocated   = cow_co_is_allocated,
> +    .bdrv_co_get_block_status   = cow_co_get_block_status,

Is it worth realigning indentation now that you have a longer name?

> +++ b/block/qcow.c
> @@ -395,7 +395,7 @@ static uint64_t get_cluster_offset(BlockDriverState *bs,
>      return cluster_offset;
>  }
>  
> -static int coroutine_fn qcow_co_is_allocated(BlockDriverState *bs,
> +static int64_t coroutine_fn qcow_co_get_block_status(BlockDriverState *bs,
>          int64_t sector_num, int nb_sectors, int *pnum)

Is it worth fixing alignment while you touch this?

>  {
>      BDRVQcowState *s = bs->opaque;
> @@ -896,7 +896,7 @@ static BlockDriver bdrv_qcow = {
>  
>      .bdrv_co_readv          = qcow_co_readv,
>      .bdrv_co_writev         = qcow_co_writev,
> -    .bdrv_co_is_allocated   = qcow_co_is_allocated,
> +    .bdrv_co_get_block_status   = qcow_co_get_block_status,

Another spot for realignment?

> +++ b/block/qcow2.c
> @@ -640,7 +640,7 @@ static int qcow2_reopen_prepare(BDRVReopenState *state,
>      return 0;
>  }
>  
> -static int coroutine_fn qcow2_co_is_allocated(BlockDriverState *bs,
> +static int64_t coroutine_fn qcow2_co_get_block_status(BlockDriverState *bs,
>          int64_t sector_num, int nb_sectors, int *pnum)

alignment?

>  {
>      BDRVQcowState *s = bs->opaque;
> @@ -1784,7 +1784,7 @@ static BlockDriver bdrv_qcow2 = {
>      .bdrv_reopen_prepare  = qcow2_reopen_prepare,
>      .bdrv_create        = qcow2_create,
>      .bdrv_has_zero_init = bdrv_has_zero_init_1,
> -    .bdrv_co_is_allocated = qcow2_co_is_allocated,
> +    .bdrv_co_get_block_status = qcow2_co_get_block_status,
>      .bdrv_set_key       = qcow2_set_key,
>      .bdrv_make_empty    = qcow2_make_empty,
>  

wow, this is already an alignment mess before your change

> +++ b/block/qed.c
> @@ -667,7 +667,7 @@ static void qed_is_allocated_cb(void *opaque, int ret, uint64_t offset, size_t l
>      }
>  }
>  
> -static int coroutine_fn bdrv_qed_co_is_allocated(BlockDriverState *bs,
> +static int64_t coroutine_fn bdrv_qed_co_get_block_status(BlockDriverState *bs,
>                                                   int64_t sector_num,
>                                                   int nb_sectors, int *pnum)

alignment

> +++ b/block/raw-posix.c
> @@ -1084,7 +1084,7 @@ static int raw_create(const char *filename, QEMUOptionParameter *options)
>   * 'nb_sectors' is the max value 'pnum' should be set to.  If nb_sectors goes
>   * beyond the end of the disk image it will be clamped.
>   */
> -static int coroutine_fn raw_co_is_allocated(BlockDriverState *bs,
> +static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
>                                              int64_t sector_num,
>                                              int nb_sectors, int *pnum)

and again

>  {
> @@ -1200,7 +1200,7 @@ static BlockDriver bdrv_file = {
>      .bdrv_close = raw_close,
>      .bdrv_create = raw_create,
>      .bdrv_has_zero_init = bdrv_has_zero_init_1,
> -    .bdrv_co_is_allocated = raw_co_is_allocated,
> +    .bdrv_co_get_block_status = raw_co_get_block_status,
>  
>      .bdrv_aio_readv = raw_aio_readv,
>      .bdrv_aio_writev = raw_aio_writev,

here, nothing was aligned, so you actually met status quo :)

But that raises a question of consistency between drivers...

> +++ b/block/raw.c
> @@ -35,11 +35,11 @@ static void raw_close(BlockDriverState *bs)
>  {
>  }
>  
> -static int coroutine_fn raw_co_is_allocated(BlockDriverState *bs,
> +static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
>                                              int64_t sector_num,
>                                              int nb_sectors, int *pnum)

alignment.  You get the picture; I'll quit pointing it out, since it
doesn't affect semantics.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/17] block: define get_block_status return value
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 10/17] block: define get_block_status return value Paolo Bonzini
  2013-07-19  6:35   ` Peter Lieven
  2013-07-19  6:48   ` Peter Lieven
@ 2013-07-19 16:52   ` Eric Blake
  2 siblings, 0 replies; 64+ messages in thread
From: Eric Blake @ 2013-07-19 16:52 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, pl, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 2574 bytes --]

On 07/16/2013 10:29 AM, Paolo Bonzini wrote:
> Define the return value of get_block_status.  Bits 0, 1, 2 and 9-62
> are valid; bit 63 (the sign bit) is reserved for errors.  Bits 3-7

bits 3-8, actually

> are left for future extensions.
> 
> The return code is compatible with the old is_allocated API: returning
> just 0 or 1 (aka BDRV_BLOCK_DATA) will not cause any behavioral change
> in clients of is_allocated.  We will return more precise information
> in the next patches.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>         v1->v2: improved comment
> 
>  block.c               |  7 +++++--
>  include/block/block.h | 26 ++++++++++++++++++++++++++
>  2 files changed, 31 insertions(+), 2 deletions(-)
> 

> +++ b/include/block/block.h
> @@ -81,6 +81,32 @@ typedef struct BlockDevOps {
>  #define BDRV_SECTOR_SIZE   (1ULL << BDRV_SECTOR_BITS)
>  #define BDRV_SECTOR_MASK   ~(BDRV_SECTOR_SIZE - 1)
>  
> +/* BDRV_BLOCK_DATA: data is read from bs->file or another file
> + * BDRV_BLOCK_ZERO: sectors read as zero
> + * BDRV_BLOCK_OFFSET_VALID: sector stored in bs->file as raw data
> + *
> + * If BDRV_BLOCK_OFFSET_VALID is set, bits 9-62 represent the offset in
> + * bs->file where sector data can be read from as raw data.

And handy that the offset happens to be aligned to the current 512
sector size, so you can just mask without shifting to get an aligned
offset to the start of the sector.

> + *
> + * DATA == 0 && ZERO == 0 means that data is read from backing_hd if present.
> + *
> + * DATA ZERO OFFSET_VALID
> + *  t    t        t       sectors read as zero, bs->file is zero at offset
> + *  t    f        t       sectors read as valid from bs->file at offset
> + *  f    t        t       sectors preallocated, read as zero, bs->file not
> + *                        necessarily zero at offset
> + *  f    f        t       sectors preallocated but read from backing_hd,
> + *                        bs->file contains garbage at offset
> + *  t    t        f       sectors preallocated, read as zero, unknown offset
> + *  t    f        f       sectors read from unknown file or offset
> + *  f    t        f       not allocated or unknown offset, read as zero
> + *  f    f        f       not allocated or unknown offset, read from backing_hd

Makes sense.

No further comments beyond other reviewers, if you want to add:
Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 17/17] block: look for zero blocks in bs->file
  2013-07-19  7:33   ` Stefan Hajnoczi
@ 2013-07-19 17:07     ` Paolo Bonzini
  2013-07-23 21:18       ` Peter Lieven
  0 siblings, 1 reply; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-19 17:07 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: pl, famz, qemu-devel, stefanha

Il 19/07/2013 09:33, Stefan Hajnoczi ha scritto:
> On Tue, Jul 16, 2013 at 06:29:28PM +0200, Paolo Bonzini wrote:
>> diff --git a/block.c b/block.c
>> index 557ce29..2d7d71f 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -2977,7 +2977,7 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
>>                                                       int nb_sectors, int *pnum)
>>  {
>>      int64_t n;
>> -    int64_t ret;
>> +    int64_t ret, ret2;
>>  
>>      if (sector_num >= bs->total_sectors) {
>>          *pnum = 0;
>> @@ -3003,6 +3003,14 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
>>          ret |= BDRV_BLOCK_ZERO;
>>      }
>>  
>> +    if (bs->file &&
>> +        (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
>> +        (ret & BDRV_BLOCK_OFFSET_VALID)) {
>> +        ret2 = bdrv_co_get_block_status(bs->file, ret >> BDRV_SECTOR_BITS,
>> +                                        *pnum, pnum);
>> +        ret |= (ret2 & BDRV_BLOCK_ZERO);
>> +    }
> 
> This patch breaks qemu-iotests 030 (image streaming).
> 
> The problem is that bdrv_co_get_block_status() uses bs->total_sectors
> directly instead of calling bdv_get_geometry()/bdrv_getlength().
> 
> With qcow2 the bs->file can grow on disk.  We don't update
> bs->total_sectors.
> 
> Then this patch calls bdrv_co_get_block_status(bs->file) where we fail
> with *pnum = 0, ret = 0 because bs->total_sectors suggests it is beyond
> the end of the file.
> 
> The result is that 030 goes into an infinite loop.
> 
> As a quick test I switched the direct bs->total_sectors accesses to
> bdrv_get_geometry() and it stopped hanging.  Perhaps the
> bs->total_sectors caching needs to be improved though.

Yes, fixing the caching also resolves the failure.  I'll send a v3 next
Monday or perhaps Sunday.

Paolo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 11/17] block: return get_block_status data and flags for formats
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 11/17] block: return get_block_status data and flags for formats Paolo Bonzini
@ 2013-07-19 19:25   ` Eric Blake
  0 siblings, 0 replies; 64+ messages in thread
From: Eric Blake @ 2013-07-19 19:25 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, pl, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 2621 bytes --]

On 07/16/2013 10:29 AM, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>         v1->v2: handle extents that are stored in bs->file
> 
>  block/cow.c      |  8 +++++++-
>  block/qcow.c     |  9 ++++++++-
>  block/qcow2.c    | 16 ++++++++++++++--
>  block/qed.c      | 35 ++++++++++++++++++++++++++++-------
>  block/sheepdog.c |  2 +-
>  block/vdi.c      | 13 ++++++++++++-
>  block/vmdk.c     | 19 ++++++++++++++++++-
>  block/vvfat.c    | 11 ++++++-----
>  8 files changed, 94 insertions(+), 19 deletions(-)
> 
> +++ b/block/qcow.c
> @@ -410,7 +410,14 @@ static int64_t coroutine_fn qcow_co_get_block_status(BlockDriverState *bs,
>      if (n > nb_sectors)
>          n = nb_sectors;
>      *pnum = n;
> -    return (cluster_offset != 0);
> +    if (!cluster_offset) {
> +	return 0;

TAB damage.

> +++ b/block/qed.c
> @@ -652,16 +652,36 @@ static int bdrv_qed_create(const char *filename, QEMUOptionParameter *options)
>  }
>  
>  typedef struct {
> +    BlockDriverState *bs;
>      Coroutine *co;
> -    int is_allocated;
> +    uint64_t pos;
> +    int64_t status;
>      int *pnum;
>  } QEDIsAllocatedCB;
>  
>  static void qed_is_allocated_cb(void *opaque, int ret, uint64_t offset, size_t len)
>  {
>      QEDIsAllocatedCB *cb = opaque;
> +    BDRVQEDState *s = cb->bs->opaque;
>      *cb->pnum = len / BDRV_SECTOR_SIZE;
> -    cb->is_allocated = (ret == QED_CLUSTER_FOUND || ret == QED_CLUSTER_ZERO);
> +    switch (ret) {
> +    case QED_CLUSTER_FOUND:
> +       offset |= qed_offset_into_cluster(s, cb->pos);
> +	cb->status = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | offset;
> +	break;

Even more mix of inconsistent use of TAB.

> +++ b/block/vdi.c
> @@ -479,12 +479,23 @@ static int64_t coroutine_fn vdi_co_get_block_status(BlockDriverState *bs,
>      size_t sector_in_block = sector_num % s->block_sectors;
>      int n_sectors = s->block_sectors - sector_in_block;
>      uint32_t bmap_entry = le32_to_cpu(s->bmap[bmap_index]);
> +    uint64_t offset;
> +    int result;
> +
>      logout("%p, %" PRId64 ", %d, %p\n", bs, sector_num, nb_sectors, pnum);
>      if (n_sectors > nb_sectors) {
>          n_sectors = nb_sectors;
>      }
>      *pnum = n_sectors;
> -    return VDI_IS_ALLOCATED(bmap_entry);
> +    result = VDI_IS_ALLOCATED(bmap_entry);
> +    if (!result) {
> +	return 0;

On a roll with TABs.

As that's whitespace only,

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 12/17] qemu-img: add a "map" subcommand
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 12/17] qemu-img: add a "map" subcommand Paolo Bonzini
  2013-07-18  7:25   ` Fam Zheng
@ 2013-07-19 19:36   ` Eric Blake
  1 sibling, 0 replies; 64+ messages in thread
From: Eric Blake @ 2013-07-19 19:36 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, pl, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 743 bytes --]

On 07/16/2013 10:29 AM, Paolo Bonzini wrote:
> This command dumps the metadata of an entire chain, in either tabular or JSON
> format.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>         v1->v2: mention encrypted clusters, use PRId64
> 
>  qemu-img-cmds.hx |   6 ++
>  qemu-img.c       | 185 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 191 insertions(+)

Missing a change to qemu-img.texi to list the documentation required there.

It might also help to show a sample of the output in the commit message.

I didn't spot anything else beyond your other reviewers.

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 13/17] block: use bdrv_has_zero_init to return BDRV_BLOCK_ZERO
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 13/17] block: use bdrv_has_zero_init to return BDRV_BLOCK_ZERO Paolo Bonzini
  2013-07-17 17:50   ` Peter Lieven
@ 2013-07-19 19:37   ` Eric Blake
  1 sibling, 0 replies; 64+ messages in thread
From: Eric Blake @ 2013-07-19 19:37 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, pl, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 437 bytes --]

On 07/16/2013 10:29 AM, Paolo Bonzini wrote:
> Alternatively, this could use a "discard zeroes data" flag returned
> by bdrv_get_info.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 14/17] raw-posix: return get_block_status data and flags
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 14/17] raw-posix: return get_block_status data and flags Paolo Bonzini
@ 2013-07-19 19:48   ` Eric Blake
  2013-07-25 12:16     ` Paolo Bonzini
  0 siblings, 1 reply; 64+ messages in thread
From: Eric Blake @ 2013-07-19 19:48 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, pl, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 1244 bytes --]

On 07/16/2013 10:29 AM, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block/raw-posix.c | 17 ++++++++++-------
>  1 file changed, 10 insertions(+), 7 deletions(-)
> 

> +++ b/block/raw-posix.c
> @@ -1089,7 +1089,7 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
>                                              int nb_sectors, int *pnum)
>  {
>      off_t start, data, hole;
> -    int ret;
> +    int64_t ret;
>  
>      ret = fd_open(bs);
>      if (ret < 0) {
> @@ -1097,6 +1097,7 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
>      }
>  
>      start = sector_num * BDRV_SECTOR_SIZE;
> +    ret = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;

off_t is a signed type; if you are compiling on a platform with 32-bit
off_t, is it possible that you will get unintended sign extension for
values of 'start' between 2 and 4 GB?  Or are such files already
impossible to open?  [Or do we intentionally require off_t be 64-bits on
all platforms we care about?]

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 15/17] raw-posix: detect XFS unwritten extents
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 15/17] raw-posix: detect XFS unwritten extents Paolo Bonzini
@ 2013-07-19 20:10   ` Eric Blake
  2013-08-02 15:05   ` Christoph Hellwig
  1 sibling, 0 replies; 64+ messages in thread
From: Eric Blake @ 2013-07-19 20:10 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, pl, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 804 bytes --]

On 07/16/2013 10:29 AM, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block/raw-posix.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/block/raw-posix.c b/block/raw-posix.c
> index d011cfd..1b41ea3 100644
> --- a/block/raw-posix.c
> +++ b/block/raw-posix.c
> @@ -1128,6 +1128,9 @@ static int64_t coroutine_fn raw_co_get_block_status(BlockDriverState *bs,
>      } else {
>          data = f.fe.fe_logical;
>          hole = f.fe.fe_logical + f.fe.fe_length;
> +        if (f.fe.fe_flags & FIEMAP_EXTENT_UNWRITTEN) {
> +            ret |= BDRV_BLOCK_ZERO;
> +        }
>      }

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 16/17] block: add default get_block_status implementation for protocols
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 16/17] block: add default get_block_status implementation for protocols Paolo Bonzini
@ 2013-07-19 20:11   ` Eric Blake
  0 siblings, 0 replies; 64+ messages in thread
From: Eric Blake @ 2013-07-19 20:11 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, pl, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 433 bytes --]

On 07/16/2013 10:29 AM, Paolo Bonzini wrote:
> Protocols return raw data, so you can assume the offsets to pass
> through unchanged.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 17/17] block: look for zero blocks in bs->file
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 17/17] block: look for zero blocks in bs->file Paolo Bonzini
  2013-07-19  7:33   ` Stefan Hajnoczi
@ 2013-07-19 20:12   ` Eric Blake
  1 sibling, 0 replies; 64+ messages in thread
From: Eric Blake @ 2013-07-19 20:12 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, pl, qemu-devel, stefanha

[-- Attachment #1: Type: text/plain, Size: 348 bytes --]

On 07/16/2013 10:29 AM, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  block.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 

Reviewed-by: Eric Blake <eblake@redhat.com>

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata
  2013-07-19 12:14       ` Paolo Bonzini
@ 2013-07-20  0:19         ` Wenchao Xia
  0 siblings, 0 replies; 64+ messages in thread
From: Wenchao Xia @ 2013-07-20  0:19 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, pl, qemu-devel, stefanha

于 2013-7-19 20:14, Paolo Bonzini 写道:
> Il 19/07/2013 12:48, Wenchao Xia ha scritto:
>>    I think dump of allocated info in qemu-img in this series, can do
>> dirty change tracking job for backing chain snapshot now, qemu's QMP
>> interface is not very needed.
>>    Only thing not perfect, is that it talks with string.
>
> It uses JSON, actually.
>
>> A library would be better, but I am not selling libqblock:),
>
> Oh you should!  Are you going back to it?  It was very close to
> mergeable state.
>
   Thanks for your positive support, it is encouraging to me.
I'll check about the block's status in qemu and go for this library
later, with a discuss with more detail such about thread, co-routine,
code organization, etc.

> Paolo
>


-- 
Best Regards

Wenchao Xia

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/17] block: define get_block_status return value
  2013-07-19 13:06           ` Peter Lieven
@ 2013-07-20  7:00             ` Paolo Bonzini
  2013-07-23 21:18               ` Peter Lieven
  0 siblings, 1 reply; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-20  7:00 UTC (permalink / raw)
  To: Peter Lieven; +Cc: famz, qemu-devel, stefanha

Il 19/07/2013 15:06, Peter Lieven ha scritto:
>>>> Il 19/07/2013 08:48, Peter Lieven ha scritto:
>>>>>> -    return bdrv_get_block_status(bs, sector_num, nb_sectors, pnum);
>>>>>> +    int64_t ret = bdrv_get_block_status(bs, sector_num, nb_sectors,
>>>>>> pnum);
>>>>>> +    return
>>>>>> +        (ret & BDRV_BLOCK_DATA) ||
>>>>>> +        ((ret & BDRV_BLOCK_ZERO) && !bdrv_has_zero_init(bs));
>>>>> i do also not understand the "((ret & BDRV_BLOCK_ZERO) &&
>>>>> !bdrv_has_zero_init(bs))";
>>>>> if a block is unallocated and reads as zero, but the device lacks zero
>>>>> init, it is declared as allocated with this, isn't it?
> 
> If it is zero and allocated the API should return only BDRV_BLOCK_DATA
> and if it is zero and unallocated only BDRV_BLOCK_ZERO or not?
> 
> What I mean is the new API shouldn't change the behaviour of the old
> bdrv_is_allocated().
> It would have returned
> 
> (ret & BDRV_BLOCK_DATA) regardless if BDRV_BLOCK_ZERO or not.

bdrv_is_allocated must return true for some zero clusters, even
if BDRV_BLOCK_DATA = 0.  See

commit 381b487d54ba18c73df9db8452028a330058c505
Author: Paolo Bonzini <pbonzini@redhat.com>
Date:   Wed Mar 6 18:02:01 2013 +0100

    qcow2: make is_allocated return true for zero clusters
    
    Otherwise, live migration of the top layer will miss zero clusters and
    let the backing file show through.  This also matches what is done in qed.
    
    QCOW2_CLUSTER_ZERO clusters are invalid in v2 image files.  Check this
    directly in qcow2_get_cluster_offset instead of replicating the test
    everywhere.
    
    Cc: qemu-stable@nongnu.org
    Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

I think the source of the confusion is that SCSI "GET LBA STATUS"
does not have to deal with backing files, bdrv_is_allocated must.
If bs->backing_hd != NULL, bdrv_is_allocated is not about allocation
of blocks in the SCSI sense; it's a query for "does the block come
from this BDS or rather from its backing file?".

So this patch must take those slightly different semantics into
account.

Paolo

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 10/17] block: define get_block_status return value
  2013-07-20  7:00             ` Paolo Bonzini
@ 2013-07-23 21:18               ` Peter Lieven
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Lieven @ 2013-07-23 21:18 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, qemu-devel, stefanha

Am 20.07.2013 09:00, schrieb Paolo Bonzini:
> Il 19/07/2013 15:06, Peter Lieven ha scritto:
>>>>> Il 19/07/2013 08:48, Peter Lieven ha scritto:
>>>>>>> -    return bdrv_get_block_status(bs, sector_num, nb_sectors, pnum);
>>>>>>> +    int64_t ret = bdrv_get_block_status(bs, sector_num, nb_sectors,
>>>>>>> pnum);
>>>>>>> +    return
>>>>>>> +        (ret & BDRV_BLOCK_DATA) ||
>>>>>>> +        ((ret & BDRV_BLOCK_ZERO) && !bdrv_has_zero_init(bs));
>>>>>> i do also not understand the "((ret & BDRV_BLOCK_ZERO) &&
>>>>>> !bdrv_has_zero_init(bs))";
>>>>>> if a block is unallocated and reads as zero, but the device lacks zero
>>>>>> init, it is declared as allocated with this, isn't it?
>> If it is zero and allocated the API should return only BDRV_BLOCK_DATA
>> and if it is zero and unallocated only BDRV_BLOCK_ZERO or not?
>>
>> What I mean is the new API shouldn't change the behaviour of the old
>> bdrv_is_allocated().
>> It would have returned
>>
>> (ret & BDRV_BLOCK_DATA) regardless if BDRV_BLOCK_ZERO or not.
> bdrv_is_allocated must return true for some zero clusters, even
> if BDRV_BLOCK_DATA = 0.  See
>
> commit 381b487d54ba18c73df9db8452028a330058c505
> Author: Paolo Bonzini <pbonzini@redhat.com>
> Date:   Wed Mar 6 18:02:01 2013 +0100
>
>     qcow2: make is_allocated return true for zero clusters
>     
>     Otherwise, live migration of the top layer will miss zero clusters and
>     let the backing file show through.  This also matches what is done in qed.
>     
>     QCOW2_CLUSTER_ZERO clusters are invalid in v2 image files.  Check this
>     directly in qcow2_get_cluster_offset instead of replicating the test
>     everywhere.
>     
>     Cc: qemu-stable@nongnu.org
>     Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>     Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>
> I think the source of the confusion is that SCSI "GET LBA STATUS"
> does not have to deal with backing files, bdrv_is_allocated must.
> If bs->backing_hd != NULL, bdrv_is_allocated is not about allocation
> of blocks in the SCSI sense; it's a query for "does the block come
> from this BDS or rather from its backing file?".
Ok, this makes it clearer. Thanks for pointing that out.
the bdrv_get_block_status() will add the possibility to
check for the allocation status which is a good thing.

Peter

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 17/17] block: look for zero blocks in bs->file
  2013-07-19 17:07     ` Paolo Bonzini
@ 2013-07-23 21:18       ` Peter Lieven
  0 siblings, 0 replies; 64+ messages in thread
From: Peter Lieven @ 2013-07-23 21:18 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Stefan Hajnoczi, famz, qemu-devel, stefanha

Am 19.07.2013 19:07, schrieb Paolo Bonzini:
> Il 19/07/2013 09:33, Stefan Hajnoczi ha scritto:
>> On Tue, Jul 16, 2013 at 06:29:28PM +0200, Paolo Bonzini wrote:
>>> diff --git a/block.c b/block.c
>>> index 557ce29..2d7d71f 100644
>>> --- a/block.c
>>> +++ b/block.c
>>> @@ -2977,7 +2977,7 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
>>>                                                       int nb_sectors, int *pnum)
>>>  {
>>>      int64_t n;
>>> -    int64_t ret;
>>> +    int64_t ret, ret2;
>>>  
>>>      if (sector_num >= bs->total_sectors) {
>>>          *pnum = 0;
>>> @@ -3003,6 +3003,14 @@ static int64_t coroutine_fn bdrv_co_get_block_status(BlockDriverState *bs,
>>>          ret |= BDRV_BLOCK_ZERO;
>>>      }
>>>  
>>> +    if (bs->file &&
>>> +        (ret & BDRV_BLOCK_DATA) && !(ret & BDRV_BLOCK_ZERO) &&
>>> +        (ret & BDRV_BLOCK_OFFSET_VALID)) {
>>> +        ret2 = bdrv_co_get_block_status(bs->file, ret >> BDRV_SECTOR_BITS,
>>> +                                        *pnum, pnum);
>>> +        ret |= (ret2 & BDRV_BLOCK_ZERO);
>>> +    }
>> This patch breaks qemu-iotests 030 (image streaming).
>>
>> The problem is that bdrv_co_get_block_status() uses bs->total_sectors
>> directly instead of calling bdv_get_geometry()/bdrv_getlength().
>>
>> With qcow2 the bs->file can grow on disk.  We don't update
>> bs->total_sectors.
>>
>> Then this patch calls bdrv_co_get_block_status(bs->file) where we fail
>> with *pnum = 0, ret = 0 because bs->total_sectors suggests it is beyond
>> the end of the file.
>>
>> The result is that 030 goes into an infinite loop.
>>
>> As a quick test I switched the direct bs->total_sectors accesses to
>> bdrv_get_geometry() and it stopped hanging.  Perhaps the
>> bs->total_sectors caching needs to be improved though.
> Yes, fixing the caching also resolves the failure.  I'll send a v3 next
> Monday or perhaps Sunday.
Do you have the v3 ready?

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 09/17] block: introduce bdrv_get_block_status API
  2013-07-19 13:43   ` Eric Blake
@ 2013-07-25 12:13     ` Paolo Bonzini
  0 siblings, 0 replies; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-25 12:13 UTC (permalink / raw)
  To: qemu-devel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Il 19/07/2013 15:43, Eric Blake ha scritto:
> On 07/16/2013 10:29 AM, Paolo Bonzini wrote:
>> For now, bdrv_get_block_status is just another name for
>> bdrv_is_allocated. The next patches will add more flags.
>> 
>> This also touches all block drivers with a mostly mechanical
>> rename.  The sole exception is cow; because it calls
>> cow_co_is_allocated from the read code, we keep that function and
>> make cow_co_get_block_status a wrapper.
>> 
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- v1->v2:
>> rebase after vmdk changes
> 
> Reviewed-by: Eric Blake <eblake@redhat.com>
> 
>> 
>> @@ -370,7 +376,7 @@ static BlockDriver bdrv_cow = {
>> 
>> .bdrv_read              = cow_co_read, .bdrv_write             =
>> cow_co_write, -    .bdrv_co_is_allocated   =
>> cow_co_is_allocated, +    .bdrv_co_get_block_status   =
>> cow_co_get_block_status,
> 
> Is it worth realigning indentation now that you have a longer
> name?
> 
>> +++ b/block/qcow.c @@ -395,7 +395,7 @@ static uint64_t
>> get_cluster_offset(BlockDriverState *bs, return cluster_offset; 
>> }
>> 
>> -static int coroutine_fn qcow_co_is_allocated(BlockDriverState
>> *bs, +static int64_t coroutine_fn
>> qcow_co_get_block_status(BlockDriverState *bs, int64_t
>> sector_num, int nb_sectors, int *pnum)
> 
> Is it worth fixing alignment while you touch this?
> 
>> { BDRVQcowState *s = bs->opaque; @@ -896,7 +896,7 @@ static
>> BlockDriver bdrv_qcow = {
>> 
>> .bdrv_co_readv          = qcow_co_readv, .bdrv_co_writev
>> = qcow_co_writev, -    .bdrv_co_is_allocated   =
>> qcow_co_is_allocated, +    .bdrv_co_get_block_status   =
>> qcow_co_get_block_status,
> 
> Another spot for realignment?
> 
>> +++ b/block/qcow2.c @@ -640,7 +640,7 @@ static int
>> qcow2_reopen_prepare(BDRVReopenState *state, return 0; }
>> 
>> -static int coroutine_fn qcow2_co_is_allocated(BlockDriverState
>> *bs, +static int64_t coroutine_fn
>> qcow2_co_get_block_status(BlockDriverState *bs, int64_t
>> sector_num, int nb_sectors, int *pnum)
> 
> alignment?
> 
>> { BDRVQcowState *s = bs->opaque; @@ -1784,7 +1784,7 @@ static
>> BlockDriver bdrv_qcow2 = { .bdrv_reopen_prepare  =
>> qcow2_reopen_prepare, .bdrv_create        = qcow2_create, 
>> .bdrv_has_zero_init = bdrv_has_zero_init_1, -
>> .bdrv_co_is_allocated = qcow2_co_is_allocated, +
>> .bdrv_co_get_block_status = qcow2_co_get_block_status, 
>> .bdrv_set_key       = qcow2_set_key, .bdrv_make_empty    =
>> qcow2_make_empty,
>> 
> 
> wow, this is already an alignment mess before your change
> 
>> +++ b/block/qed.c @@ -667,7 +667,7 @@ static void
>> qed_is_allocated_cb(void *opaque, int ret, uint64_t offset,
>> size_t l } }
>> 
>> -static int coroutine_fn
>> bdrv_qed_co_is_allocated(BlockDriverState *bs, +static int64_t
>> coroutine_fn bdrv_qed_co_get_block_status(BlockDriverState *bs, 
>> int64_t sector_num, int nb_sectors, int *pnum)
> 
> alignment
> 
>> +++ b/block/raw-posix.c @@ -1084,7 +1084,7 @@ static int
>> raw_create(const char *filename, QEMUOptionParameter *options) *
>> 'nb_sectors' is the max value 'pnum' should be set to.  If
>> nb_sectors goes * beyond the end of the disk image it will be
>> clamped. */ -static int coroutine_fn
>> raw_co_is_allocated(BlockDriverState *bs, +static int64_t
>> coroutine_fn raw_co_get_block_status(BlockDriverState *bs, 
>> int64_t sector_num, int nb_sectors, int *pnum)
> 
> and again
> 
>> { @@ -1200,7 +1200,7 @@ static BlockDriver bdrv_file = { 
>> .bdrv_close = raw_close, .bdrv_create = raw_create, 
>> .bdrv_has_zero_init = bdrv_has_zero_init_1, -
>> .bdrv_co_is_allocated = raw_co_is_allocated, +
>> .bdrv_co_get_block_status = raw_co_get_block_status,
>> 
>> .bdrv_aio_readv = raw_aio_readv, .bdrv_aio_writev =
>> raw_aio_writev,
> 
> here, nothing was aligned, so you actually met status quo :)
> 
> But that raises a question of consistency between drivers...
> 
>> +++ b/block/raw.c @@ -35,11 +35,11 @@ static void
>> raw_close(BlockDriverState *bs) { }
>> 
>> -static int coroutine_fn raw_co_is_allocated(BlockDriverState
>> *bs, +static int64_t coroutine_fn
>> raw_co_get_block_status(BlockDriverState *bs, int64_t
>> sector_num, int nb_sectors, int *pnum)
> 
> alignment.  You get the picture; I'll quit pointing it out, since
> it doesn't affect semantics.

Got it, let's make everything aligned in a separate patch if deemed
useful.  I didn't want to taint "git blame".

Paolo

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJR8RZUAAoJEBvWZb6bTYby2OYP/jDW+qofVnm/sEE0x6j3p/JD
9JlK3kBKIJ3IQLFBWWz7/scNNF4Y0mPZ1U1ylU95F/BZzyntURsCPLGIBrScgx43
L5V1auiHV/aNEWnAD2sHDfkumiS0x8nzq6yEu8go2ldZBQR+2m9gn5wfWTmNs3nD
OEhg9Fzz7Lka5KwtGDezcjthKV92eBBEQCJAkx0qa/Jf4BrwQo3P4wCn+4nOJoii
veQz809EnQ5VAA1GJ6xIq7iRySLAlbeaG1Km+iB/1gDtD46oRoYOOZGjtvzXQWcY
NIVihNsGx9PUYD/uhGhKqxoOJ9Z9Wx6qLadFYi50vFrvBbZ31+Fl5IRrkEgNZb/G
eWwPmXJIfqoEyQ4MNdJSQuQhBjd0Y3KCvSBOSi+jQR05osyzvHF5iSWL6SshPocN
Ck3blVSxjcZtT2ZhwKxcG0Jk1p9YmnaNGOC8yUxxpgMtJCz7U8G/ki1RaK17VVHU
yFXnvvyxfbLl6tpQtPg0MI/XLpyO1BQeJD8C1+q5F16naVczxIB+N/ugKvhWKxNo
A7Ou9LgiXq09R6hVrh1Qlz65TbSRwP/j7y07fYj74n14IXwoyt1ybi4RjmM6d0Po
vA8uRDb7AVxX4XidFlDe5KTQy3ANGFuhX8Du3JwMvc/KqbqK3rFTT+slDdbJISKJ
AeuJHNxM3AYyK53EK4C3
=SxhS
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 14/17] raw-posix: return get_block_status data and flags
  2013-07-19 19:48   ` Eric Blake
@ 2013-07-25 12:16     ` Paolo Bonzini
  0 siblings, 0 replies; 64+ messages in thread
From: Paolo Bonzini @ 2013-07-25 12:16 UTC (permalink / raw)
  To: Eric Blake; +Cc: pl, famz, qemu-devel, stefanha

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Il 19/07/2013 21:48, Eric Blake ha scritto:
>>> start = sector_num * BDRV_SECTOR_SIZE; +    ret =
>>> BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID | start;
> off_t is a signed type; if you are compiling on a platform with
> 32-bit off_t, is it possible that you will get unintended sign
> extension for values of 'start' between 2 and 4 GB?  Or are such
> files already impossible to open?  [Or do we intentionally require
> off_t be 64-bits on all platforms we care about?]

Good question.  If we don't, we should.

In the meanwhile, I'll change start/data/hole to be int64_t.

Paolo
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

iQIcBAEBAgAGBQJR8RcVAAoJEBvWZb6bTYbymN0P/15/doObltN+2vCE1WtQD8IZ
Woqd/Itct/1KXHCLv423ThO23/WpE3KSPna25a19sGeJQSXdZB7E2shDtxNErEob
a0c0n6yMzDI0iU+UX1/mPacoIZt5tiWHzd3W4sTIgVDjHUa1Xctgp8bv3gNFNHzh
Ese6mdNNhJQCPqA67qLhzSGfnWUCxGV0F9HDK23dfYJUQ8yoEnRWqLYjRH0qgEG4
LRydUCyfU3BjvmRlLLRMrJHxSJKQKbYMO94IY1ZPqKn1TxKPnOHdIYOgNP8wFAWA
3uA5kc0r9qEmSyh01dbQ+bBkp9lRzDRgxrmii4LZXiQvF6kAXxrY5fEuUHrewGwC
1KeFccGLkgWH1jI19M8URjzbjQwGm79yYIqlt4ZLDwam88VXlxYse7odaRESGSWY
HQqak7WSlDFPlvNj7eR3lruGN3E9F6XAoMCVwRnPRbrx9G9ezH302VF7IXHvlFJu
yK+iRwoWUSkKPWVBxiMrI5HAWlt4N6Akje2VPhkaJQYBQ0LwLhMiDWfMJySW9M7I
qNz/CjbMljwEGOEjYZ7PUKd0H76mcX1xwDi3ofBkJn3Bu74+C5xs7ZRRNO/tIu2c
mVxxYbP7R3dTDdF6sOTVIjUt9M7ObDJ0I0Y0aev1BSTv5tiZb9KB+ef9DRn2Drhk
cVcuJYJiEvL3JV9J5zDE
=ANdZ
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 64+ messages in thread

* Re: [Qemu-devel] [PATCH v2 15/17] raw-posix: detect XFS unwritten extents
  2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 15/17] raw-posix: detect XFS unwritten extents Paolo Bonzini
  2013-07-19 20:10   ` Eric Blake
@ 2013-08-02 15:05   ` Christoph Hellwig
  1 sibling, 0 replies; 64+ messages in thread
From: Christoph Hellwig @ 2013-08-02 15:05 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: famz, pl, qemu-devel, stefanha

On Tue, Jul 16, 2013 at 06:29:26PM +0200, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

This isn't really XFS specific, at least ext4 and ocfs2 can report the same.

^ permalink raw reply	[flat|nested] 64+ messages in thread

end of thread, other threads:[~2013-08-02 15:06 UTC | newest]

Thread overview: 64+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-16 16:29 [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata Paolo Bonzini
2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 01/17] cow: make reads go at a decent speed Paolo Bonzini
2013-07-19 12:31   ` Eric Blake
2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 02/17] cow: make writes go at a less indecent speed Paolo Bonzini
2013-07-19 12:40   ` Eric Blake
2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 03/17] cow: do not call bdrv_co_is_allocated Paolo Bonzini
2013-07-19 12:42   ` Eric Blake
2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 04/17] block: make bdrv_co_is_allocated static Paolo Bonzini
2013-07-19 12:44   ` Eric Blake
2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 05/17] block: remove bdrv_is_allocated_above/bdrv_co_is_allocated_above distinction Paolo Bonzini
2013-07-19 12:46   ` Eric Blake
2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 06/17] block: expect errors from bdrv_co_is_allocated Paolo Bonzini
2013-07-19 12:57   ` Eric Blake
2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 07/17] qemu-img: always probe the input image for allocated sectors Paolo Bonzini
2013-07-19 13:05   ` Eric Blake
2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 08/17] block: make bdrv_has_zero_init return false for copy-on-write-images Paolo Bonzini
2013-07-19 13:21   ` Eric Blake
2013-07-19 13:23     ` Paolo Bonzini
2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 09/17] block: introduce bdrv_get_block_status API Paolo Bonzini
2013-07-19 13:43   ` Eric Blake
2013-07-25 12:13     ` Paolo Bonzini
2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 10/17] block: define get_block_status return value Paolo Bonzini
2013-07-19  6:35   ` Peter Lieven
2013-07-19  9:14     ` Paolo Bonzini
2013-07-19  6:48   ` Peter Lieven
2013-07-19  9:12     ` Paolo Bonzini
2013-07-19  9:58     ` Paolo Bonzini
2013-07-19 10:04       ` Peter Lieven
2013-07-19 12:13         ` Paolo Bonzini
2013-07-19 13:06           ` Peter Lieven
2013-07-20  7:00             ` Paolo Bonzini
2013-07-23 21:18               ` Peter Lieven
2013-07-19 16:52   ` Eric Blake
2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 11/17] block: return get_block_status data and flags for formats Paolo Bonzini
2013-07-19 19:25   ` Eric Blake
2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 12/17] qemu-img: add a "map" subcommand Paolo Bonzini
2013-07-18  7:25   ` Fam Zheng
2013-07-18  8:55     ` Paolo Bonzini
2013-07-19 19:36   ` Eric Blake
2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 13/17] block: use bdrv_has_zero_init to return BDRV_BLOCK_ZERO Paolo Bonzini
2013-07-17 17:50   ` Peter Lieven
2013-07-19 19:37   ` Eric Blake
2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 14/17] raw-posix: return get_block_status data and flags Paolo Bonzini
2013-07-19 19:48   ` Eric Blake
2013-07-25 12:16     ` Paolo Bonzini
2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 15/17] raw-posix: detect XFS unwritten extents Paolo Bonzini
2013-07-19 20:10   ` Eric Blake
2013-08-02 15:05   ` Christoph Hellwig
2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 16/17] block: add default get_block_status implementation for protocols Paolo Bonzini
2013-07-19 20:11   ` Eric Blake
2013-07-16 16:29 ` [Qemu-devel] [PATCH v2 17/17] block: look for zero blocks in bs->file Paolo Bonzini
2013-07-19  7:33   ` Stefan Hajnoczi
2013-07-19 17:07     ` Paolo Bonzini
2013-07-23 21:18       ` Peter Lieven
2013-07-19 20:12   ` Eric Blake
2013-07-16 16:56 ` [Qemu-devel] [PATCH v2 00/17] Add qemu-img subcommand to dump file metadata Eric Blake
2013-07-16 16:57   ` Paolo Bonzini
2013-07-19 10:48     ` Wenchao Xia
2013-07-19 12:14       ` Paolo Bonzini
2013-07-20  0:19         ` Wenchao Xia
2013-07-18  8:17 ` Peter Lieven
2013-07-18  8:58   ` Paolo Bonzini
2013-07-19  4:45 ` Stefan Hajnoczi
2013-07-19  7:34 ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.