All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [patch 0/3] block migration fixes
@ 2010-11-08 19:02 Marcelo Tosatti
  2010-11-08 19:02 ` [Qemu-devel] [patch 1/3] block: fix shift in dirty bitmap calculation Marcelo Tosatti
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Marcelo Tosatti @ 2010-11-08 19:02 UTC (permalink / raw)
  To: Liran Schour, Kevin Wolf; +Cc: Yoshiaki Tamura, qemu-devel

Following patchset fixes block migration corruption issues.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [patch 1/3] block: fix shift in dirty bitmap calculation
  2010-11-08 19:02 [Qemu-devel] [patch 0/3] block migration fixes Marcelo Tosatti
@ 2010-11-08 19:02 ` Marcelo Tosatti
  2010-11-09 12:02   ` [Qemu-devel] " Kevin Wolf
  2010-11-08 19:02 ` [Qemu-devel] [patch 2/3] block: set sector dirty on AIO write completion Marcelo Tosatti
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Marcelo Tosatti @ 2010-11-08 19:02 UTC (permalink / raw)
  To: Liran Schour, Kevin Wolf; +Cc: Marcelo Tosatti, Yoshiaki Tamura, qemu-devel

[-- Attachment #1: 01-block-dirty-log-shift --]
[-- Type: text/plain, Size: 1378 bytes --]

Otherwise upper 32 bits of bitmap entries are not correctly calculated.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: qemu-kvm/block.c
===================================================================
--- qemu-kvm.orig/block.c
+++ qemu-kvm/block.c
@@ -930,14 +930,14 @@ static void set_dirty_bitmap(BlockDriver
         bit = start % (sizeof(unsigned long) * 8);
         val = bs->dirty_bitmap[idx];
         if (dirty) {
-            if (!(val & (1 << bit))) {
+            if (!(val & (1UL << bit))) {
                 bs->dirty_count++;
-                val |= 1 << bit;
+                val |= 1UL << bit;
             }
         } else {
-            if (val & (1 << bit)) {
+            if (val & (1UL << bit)) {
                 bs->dirty_count--;
-                val &= ~(1 << bit);
+                val &= ~(1UL << bit);
             }
         }
         bs->dirty_bitmap[idx] = val;
@@ -2672,8 +2672,8 @@ int bdrv_get_dirty(BlockDriverState *bs,
 
     if (bs->dirty_bitmap &&
         (sector << BDRV_SECTOR_BITS) < bdrv_getlength(bs)) {
-        return bs->dirty_bitmap[chunk / (sizeof(unsigned long) * 8)] &
-            (1 << (chunk % (sizeof(unsigned long) * 8)));
+        return !!(bs->dirty_bitmap[chunk / (sizeof(unsigned long) * 8)] &
+            (1UL << (chunk % (sizeof(unsigned long) * 8))));
     } else {
         return 0;
     }

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [patch 2/3] block: set sector dirty on AIO write completion
  2010-11-08 19:02 [Qemu-devel] [patch 0/3] block migration fixes Marcelo Tosatti
  2010-11-08 19:02 ` [Qemu-devel] [patch 1/3] block: fix shift in dirty bitmap calculation Marcelo Tosatti
@ 2010-11-08 19:02 ` Marcelo Tosatti
  2010-11-09 12:02   ` [Qemu-devel] " Kevin Wolf
  2010-11-08 19:02 ` [Qemu-devel] [patch 3/3] block migration: do not submit multiple AIOs for same sector Marcelo Tosatti
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Marcelo Tosatti @ 2010-11-08 19:02 UTC (permalink / raw)
  To: Liran Schour, Kevin Wolf; +Cc: Marcelo Tosatti, Yoshiaki Tamura, qemu-devel

[-- Attachment #1: 02-block-aio-write-dirty-bitmap --]
[-- Type: text/plain, Size: 2411 bytes --]

Sectors are marked dirty in the bitmap on AIO submission. This is wrong
since data has not reached storage.

Set a given sector as dirty in the dirty bitmap on AIO completion, so that
reading a sector marked as dirty is guaranteed to return uptodate data.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Index: qemu-kvm/block.c
===================================================================
--- qemu-kvm.orig/block.c
+++ qemu-kvm/block.c
@@ -2018,12 +2018,49 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDr
     return ret;
 }
 
+typedef struct BlockCompleteData {
+    BlockDriverCompletionFunc *cb;
+    void *opaque;
+    BlockDriverState *bs;
+    int64_t sector_num;
+    int nb_sectors;
+} BlockCompleteData;
+
+static void block_complete_cb(void *opaque, int ret)
+{
+    BlockCompleteData *b = opaque;
+
+    if (b->bs->dirty_bitmap) {
+        set_dirty_bitmap(b->bs, b->sector_num, b->nb_sectors, 1);
+    }
+    b->cb(b->opaque, ret);
+    qemu_free(b);
+}
+
+static BlockCompleteData *blk_dirty_cb_alloc(BlockDriverState *bs,
+                                             int64_t sector_num,
+                                             int nb_sectors,
+                                             BlockDriverCompletionFunc *cb,
+                                             void *opaque)
+{
+    BlockCompleteData *blkdata = qemu_mallocz(sizeof(BlockCompleteData));
+
+    blkdata->bs = bs;
+    blkdata->cb = cb;
+    blkdata->opaque = opaque;
+    blkdata->sector_num = sector_num;
+    blkdata->nb_sectors = nb_sectors;
+
+    return blkdata;
+}
+
 BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num,
                                   QEMUIOVector *qiov, int nb_sectors,
                                   BlockDriverCompletionFunc *cb, void *opaque)
 {
     BlockDriver *drv = bs->drv;
     BlockDriverAIOCB *ret;
+    BlockCompleteData *blk_cb_data;
 
     trace_bdrv_aio_writev(bs, sector_num, nb_sectors, opaque);
 
@@ -2035,7 +2072,10 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockD
         return NULL;
 
     if (bs->dirty_bitmap) {
-        set_dirty_bitmap(bs, sector_num, nb_sectors, 1);
+        blk_cb_data = blk_dirty_cb_alloc(bs, sector_num, nb_sectors, cb,
+                                         opaque);
+        cb = &block_complete_cb;
+        opaque = blk_cb_data;
     }
 
     ret = drv->bdrv_aio_writev(bs, sector_num, qiov, nb_sectors,

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] [patch 3/3] block migration: do not submit multiple AIOs for same sector
  2010-11-08 19:02 [Qemu-devel] [patch 0/3] block migration fixes Marcelo Tosatti
  2010-11-08 19:02 ` [Qemu-devel] [patch 1/3] block: fix shift in dirty bitmap calculation Marcelo Tosatti
  2010-11-08 19:02 ` [Qemu-devel] [patch 2/3] block: set sector dirty on AIO write completion Marcelo Tosatti
@ 2010-11-08 19:02 ` Marcelo Tosatti
  2010-11-09 12:42   ` [Qemu-devel] " Kevin Wolf
  2010-11-09  6:02 ` [Qemu-devel] Re: [patch 0/3] block migration fixes Yoshiaki Tamura
  2010-11-21 15:22 ` [Qemu-devel] " Anthony Liguori
  4 siblings, 1 reply; 14+ messages in thread
From: Marcelo Tosatti @ 2010-11-08 19:02 UTC (permalink / raw)
  To: Liran Schour, Kevin Wolf; +Cc: Marcelo Tosatti, Yoshiaki Tamura, qemu-devel

[-- Attachment #1: 03-block-migration-inflight-read --]
[-- Type: text/plain, Size: 5041 bytes --]

Block migration can submit multiple AIO reads for the same sector/chunk, but
completion of such reads can happen out of order:

migration               guest
- get_dirty(N)
- aio_read(N) 
- clear_dirty(N)
                        write(N)
                        set_dirty(N)
- get_dirty(N)
- aio_read(N)

If the first aio_read completes after the second, stale data will be 
migrated to the destination.

Fix by not allowing multiple AIOs inflight for the same sector.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>


Index: qemu-kvm/block-migration.c
===================================================================
--- qemu-kvm.orig/block-migration.c
+++ qemu-kvm/block-migration.c
@@ -49,12 +49,14 @@ typedef struct BlkMigDevState {
     int64_t total_sectors;
     int64_t dirty;
     QSIMPLEQ_ENTRY(BlkMigDevState) entry;
+    unsigned long *aio_bitmap;
 } BlkMigDevState;
 
 typedef struct BlkMigBlock {
     uint8_t *buf;
     BlkMigDevState *bmds;
     int64_t sector;
+    int nr_sectors;
     struct iovec iov;
     QEMUIOVector qiov;
     BlockDriverAIOCB *aiocb;
@@ -140,6 +142,57 @@ static inline long double compute_read_b
     return  (block_mig_state.reads * BLOCK_SIZE)/ block_mig_state.total_time;
 }
 
+static int bmds_aio_inflight(BlkMigDevState *bmds, int64_t sector)
+{
+    int64_t chunk = sector / (int64_t)BDRV_SECTORS_PER_DIRTY_CHUNK;
+
+    if (bmds->aio_bitmap &&
+        (sector << BDRV_SECTOR_BITS) < bdrv_getlength(bmds->bs)) {
+        return !!(bmds->aio_bitmap[chunk / (sizeof(unsigned long) * 8)] &
+            (1UL << (chunk % (sizeof(unsigned long) * 8))));
+    } else {
+        return 0;
+    }
+}
+
+static void bmds_set_aio_inflight(BlkMigDevState *bmds, int64_t sector_num,
+                             int nb_sectors, int set)
+{
+    int64_t start, end;
+    unsigned long val, idx, bit;
+
+    start = sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK;
+    end = (sector_num + nb_sectors - 1) / BDRV_SECTORS_PER_DIRTY_CHUNK;
+
+    for (; start <= end; start++) {
+        idx = start / (sizeof(unsigned long) * 8);
+        bit = start % (sizeof(unsigned long) * 8);
+        val = bmds->aio_bitmap[idx];
+        if (set) {
+            if (!(val & (1UL << bit))) {
+                val |= 1UL << bit;
+            }
+        } else {
+            if (val & (1UL << bit)) {
+                val &= ~(1UL << bit);
+            }
+        }
+        bmds->aio_bitmap[idx] = val;
+    }
+}
+
+static void alloc_aio_bitmap(BlkMigDevState *bmds)
+{
+    BlockDriverState *bs = bmds->bs;
+    int64_t bitmap_size;
+
+    bitmap_size = (bdrv_getlength(bs) >> BDRV_SECTOR_BITS) +
+            BDRV_SECTORS_PER_DIRTY_CHUNK * 8 - 1;
+    bitmap_size /= BDRV_SECTORS_PER_DIRTY_CHUNK * 8;
+
+    bmds->aio_bitmap = qemu_mallocz(bitmap_size);
+}
+
 static void blk_mig_read_cb(void *opaque, int ret)
 {
     BlkMigBlock *blk = opaque;
@@ -151,6 +204,7 @@ static void blk_mig_read_cb(void *opaque
     add_avg_read_time(blk->time);
 
     QSIMPLEQ_INSERT_TAIL(&block_mig_state.blk_list, blk, entry);
+    bmds_set_aio_inflight(blk->bmds, blk->sector, blk->nr_sectors, 0);
 
     block_mig_state.submitted--;
     block_mig_state.read_done++;
@@ -194,6 +248,7 @@ static int mig_save_device_bulk(Monitor 
     blk->buf = qemu_malloc(BLOCK_SIZE);
     blk->bmds = bmds;
     blk->sector = cur_sector;
+    blk->nr_sectors = nr_sectors;
 
     blk->iov.iov_base = blk->buf;
     blk->iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE;
@@ -248,6 +303,7 @@ static void init_blk_migration_it(void *
         bmds->total_sectors = sectors;
         bmds->completed_sectors = 0;
         bmds->shared_base = block_mig_state.shared_base;
+        alloc_aio_bitmap(bmds);
 
         block_mig_state.total_sector_sum += sectors;
 
@@ -329,6 +385,8 @@ static int mig_save_device_dirty(Monitor
     int nr_sectors;
 
     for (sector = bmds->cur_dirty; sector < bmds->total_sectors;) {
+        if (bmds_aio_inflight(bmds, sector))
+            qemu_aio_flush();
         if (bdrv_get_dirty(bmds->bs, sector)) {
 
             if (total_sectors - sector < BDRV_SECTORS_PER_DIRTY_CHUNK) {
@@ -340,6 +398,7 @@ static int mig_save_device_dirty(Monitor
             blk->buf = qemu_malloc(BLOCK_SIZE);
             blk->bmds = bmds;
             blk->sector = sector;
+            blk->nr_sectors = nr_sectors;
 
             if (is_async) {
                 blk->iov.iov_base = blk->buf;
@@ -354,6 +413,7 @@ static int mig_save_device_dirty(Monitor
                     goto error;
                 }
                 block_mig_state.submitted++;
+                bmds_set_aio_inflight(bmds, sector, nr_sectors, 1);
             } else {
                 if (bdrv_read(bmds->bs, sector, blk->buf,
                               nr_sectors) < 0) {
@@ -474,6 +534,7 @@ static void blk_mig_cleanup(Monitor *mon
 
     while ((bmds = QSIMPLEQ_FIRST(&block_mig_state.bmds_list)) != NULL) {
         QSIMPLEQ_REMOVE_HEAD(&block_mig_state.bmds_list, entry);
+        qemu_free(bmds->aio_bitmap);
         qemu_free(bmds);
     }
 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] Re: [patch 0/3] block migration fixes
  2010-11-08 19:02 [Qemu-devel] [patch 0/3] block migration fixes Marcelo Tosatti
                   ` (2 preceding siblings ...)
  2010-11-08 19:02 ` [Qemu-devel] [patch 3/3] block migration: do not submit multiple AIOs for same sector Marcelo Tosatti
@ 2010-11-09  6:02 ` Yoshiaki Tamura
  2010-11-09 13:08   ` Marcelo Tosatti
  2010-11-21 15:22 ` [Qemu-devel] " Anthony Liguori
  4 siblings, 1 reply; 14+ messages in thread
From: Yoshiaki Tamura @ 2010-11-09  6:02 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Kevin Wolf, ohmura.kei, qemu-devel, Liran Schour

Marcelo Tosatti wrote:
> Following patchset fixes block migration corruption issues.

Hi Marcelo,

Thanks for looking into this issue.  Although we tried your patches, we're still
seeing the corruption.  If we execute block migration while copying a file
locally, md5sum of the copied file doesn't match with the original.  Sometimes,
the filesystem returns an I/O error.

Could you let us know how you tested and debugged?  Did you use blkverify?

Thanks,

Yoshi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] Re: [patch 1/3] block: fix shift in dirty bitmap calculation
  2010-11-08 19:02 ` [Qemu-devel] [patch 1/3] block: fix shift in dirty bitmap calculation Marcelo Tosatti
@ 2010-11-09 12:02   ` Kevin Wolf
  0 siblings, 0 replies; 14+ messages in thread
From: Kevin Wolf @ 2010-11-09 12:02 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Yoshiaki Tamura, Liran Schour, qemu-devel

Am 08.11.2010 20:02, schrieb Marcelo Tosatti:
> Otherwise upper 32 bits of bitmap entries are not correctly calculated.
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] Re: [patch 2/3] block: set sector dirty on AIO write completion
  2010-11-08 19:02 ` [Qemu-devel] [patch 2/3] block: set sector dirty on AIO write completion Marcelo Tosatti
@ 2010-11-09 12:02   ` Kevin Wolf
  0 siblings, 0 replies; 14+ messages in thread
From: Kevin Wolf @ 2010-11-09 12:02 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Yoshiaki Tamura, Liran Schour, qemu-devel

Am 08.11.2010 20:02, schrieb Marcelo Tosatti:
> Sectors are marked dirty in the bitmap on AIO submission. This is wrong
> since data has not reached storage.
> 
> Set a given sector as dirty in the dirty bitmap on AIO completion, so that
> reading a sector marked as dirty is guaranteed to return uptodate data.
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>

Reviewed-by: Kevin Wolf <kwolf@redhat.com>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] Re: [patch 3/3] block migration: do not submit multiple AIOs for same sector
  2010-11-08 19:02 ` [Qemu-devel] [patch 3/3] block migration: do not submit multiple AIOs for same sector Marcelo Tosatti
@ 2010-11-09 12:42   ` Kevin Wolf
  2010-11-09 13:49     ` Marcelo Tosatti
  0 siblings, 1 reply; 14+ messages in thread
From: Kevin Wolf @ 2010-11-09 12:42 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Yoshiaki Tamura, Liran Schour, qemu-devel

Am 08.11.2010 20:02, schrieb Marcelo Tosatti:
> Block migration can submit multiple AIO reads for the same sector/chunk, but
> completion of such reads can happen out of order:
> 
> migration               guest
> - get_dirty(N)
> - aio_read(N) 
> - clear_dirty(N)
>                         write(N)
>                         set_dirty(N)
> - get_dirty(N)
> - aio_read(N)
> 
> If the first aio_read completes after the second, stale data will be 
> migrated to the destination.
> 
> Fix by not allowing multiple AIOs inflight for the same sector.
> 
> Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
> 
> 
> Index: qemu-kvm/block-migration.c
> ===================================================================
> --- qemu-kvm.orig/block-migration.c
> +++ qemu-kvm/block-migration.c
> @@ -49,12 +49,14 @@ typedef struct BlkMigDevState {
>      int64_t total_sectors;
>      int64_t dirty;
>      QSIMPLEQ_ENTRY(BlkMigDevState) entry;
> +    unsigned long *aio_bitmap;
>  } BlkMigDevState;
>  
>  typedef struct BlkMigBlock {
>      uint8_t *buf;
>      BlkMigDevState *bmds;
>      int64_t sector;
> +    int nr_sectors;
>      struct iovec iov;
>      QEMUIOVector qiov;
>      BlockDriverAIOCB *aiocb;
> @@ -140,6 +142,57 @@ static inline long double compute_read_b
>      return  (block_mig_state.reads * BLOCK_SIZE)/ block_mig_state.total_time;
>  }
>  
> +static int bmds_aio_inflight(BlkMigDevState *bmds, int64_t sector)
> +{
> +    int64_t chunk = sector / (int64_t)BDRV_SECTORS_PER_DIRTY_CHUNK;
> +
> +    if (bmds->aio_bitmap &&

Is bmds->aio_bitmap ever supposed to be NULL?

> +        (sector << BDRV_SECTOR_BITS) < bdrv_getlength(bmds->bs)) {

Hm, the problem probably already exists without this patch, but I was
wondering what happens if the result of bdrv_getlength() changes.

An image that is presented to the guest can't grow at the moment, even
though I think there were some thoughts about live resizing. However, do
we make sure that you can't close the image during block migration, by
using eject, change or hot-unplugging a disk?

> +        return !!(bmds->aio_bitmap[chunk / (sizeof(unsigned long) * 8)] &
> +            (1UL << (chunk % (sizeof(unsigned long) * 8))));
> +    } else {
> +        return 0;
> +    }
> +}
> +
> +static void bmds_set_aio_inflight(BlkMigDevState *bmds, int64_t sector_num,
> +                             int nb_sectors, int set)
> +{
> +    int64_t start, end;
> +    unsigned long val, idx, bit;
> +
> +    start = sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK;
> +    end = (sector_num + nb_sectors - 1) / BDRV_SECTORS_PER_DIRTY_CHUNK;
> +
> +    for (; start <= end; start++) {
> +        idx = start / (sizeof(unsigned long) * 8);
> +        bit = start % (sizeof(unsigned long) * 8);
> +        val = bmds->aio_bitmap[idx];
> +        if (set) {
> +            if (!(val & (1UL << bit))) {
> +                val |= 1UL << bit;
> +            }

Why the if? Just doing the |= unconditionally would be enough.

> +        } else {
> +            if (val & (1UL << bit)) {
> +                val &= ~(1UL << bit);
> +            }

Same here.

> +        }
> +        bmds->aio_bitmap[idx] = val;
> +    }
> +}
> +
> +static void alloc_aio_bitmap(BlkMigDevState *bmds)
> +{
> +    BlockDriverState *bs = bmds->bs;
> +    int64_t bitmap_size;
> +
> +    bitmap_size = (bdrv_getlength(bs) >> BDRV_SECTOR_BITS) +
> +            BDRV_SECTORS_PER_DIRTY_CHUNK * 8 - 1;
> +    bitmap_size /= BDRV_SECTORS_PER_DIRTY_CHUNK * 8;
> +
> +    bmds->aio_bitmap = qemu_mallocz(bitmap_size);
> +}
> +
>  static void blk_mig_read_cb(void *opaque, int ret)
>  {
>      BlkMigBlock *blk = opaque;
> @@ -151,6 +204,7 @@ static void blk_mig_read_cb(void *opaque
>      add_avg_read_time(blk->time);
>  
>      QSIMPLEQ_INSERT_TAIL(&block_mig_state.blk_list, blk, entry);
> +    bmds_set_aio_inflight(blk->bmds, blk->sector, blk->nr_sectors, 0);
>  
>      block_mig_state.submitted--;
>      block_mig_state.read_done++;
> @@ -194,6 +248,7 @@ static int mig_save_device_bulk(Monitor 
>      blk->buf = qemu_malloc(BLOCK_SIZE);
>      blk->bmds = bmds;
>      blk->sector = cur_sector;
> +    blk->nr_sectors = nr_sectors;
>  
>      blk->iov.iov_base = blk->buf;
>      blk->iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE;
> @@ -248,6 +303,7 @@ static void init_blk_migration_it(void *
>          bmds->total_sectors = sectors;
>          bmds->completed_sectors = 0;
>          bmds->shared_base = block_mig_state.shared_base;
> +        alloc_aio_bitmap(bmds);
>  
>          block_mig_state.total_sector_sum += sectors;
>  
> @@ -329,6 +385,8 @@ static int mig_save_device_dirty(Monitor
>      int nr_sectors;
>  
>      for (sector = bmds->cur_dirty; sector < bmds->total_sectors;) {
> +        if (bmds_aio_inflight(bmds, sector))
> +            qemu_aio_flush();

Coding style requires braces.

Also, qemu_aio_flush() isn't a very nice solution. It works and fixes a
bug, but it blocks everything until all requests (and the requests
created by their callbacks) are completed. Can't we just skip this dirty
sector and retry next time?

>          if (bdrv_get_dirty(bmds->bs, sector)) {
>  
>              if (total_sectors - sector < BDRV_SECTORS_PER_DIRTY_CHUNK) {
> @@ -340,6 +398,7 @@ static int mig_save_device_dirty(Monitor
>              blk->buf = qemu_malloc(BLOCK_SIZE);
>              blk->bmds = bmds;
>              blk->sector = sector;
> +            blk->nr_sectors = nr_sectors;
>  
>              if (is_async) {
>                  blk->iov.iov_base = blk->buf;
> @@ -354,6 +413,7 @@ static int mig_save_device_dirty(Monitor
>                      goto error;
>                  }
>                  block_mig_state.submitted++;
> +                bmds_set_aio_inflight(bmds, sector, nr_sectors, 1);

Just to double-check if I understand right: mig_save_device_bulk doesn't
set the bit, but that's okay because it would never try to submit
overlapping requests anyway. Also copying the dirty regions only starts
when no more bulk requests are in flight.

Correct?

>              } else {
>                  if (bdrv_read(bmds->bs, sector, blk->buf,
>                                nr_sectors) < 0) {
> @@ -474,6 +534,7 @@ static void blk_mig_cleanup(Monitor *mon
>  
>      while ((bmds = QSIMPLEQ_FIRST(&block_mig_state.bmds_list)) != NULL) {
>          QSIMPLEQ_REMOVE_HEAD(&block_mig_state.bmds_list, entry);
> +        qemu_free(bmds->aio_bitmap);
>          qemu_free(bmds);
>      }
>  
> 
> 

Kevin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] Re: [patch 0/3] block migration fixes
  2010-11-09  6:02 ` [Qemu-devel] Re: [patch 0/3] block migration fixes Yoshiaki Tamura
@ 2010-11-09 13:08   ` Marcelo Tosatti
  2010-11-11  9:06     ` Yoshiaki Tamura
  0 siblings, 1 reply; 14+ messages in thread
From: Marcelo Tosatti @ 2010-11-09 13:08 UTC (permalink / raw)
  To: Yoshiaki Tamura; +Cc: Kevin Wolf, ohmura.kei, qemu-devel, Liran Schour

On Tue, Nov 09, 2010 at 03:02:12PM +0900, Yoshiaki Tamura wrote:
> Marcelo Tosatti wrote:
> > Following patchset fixes block migration corruption issues.
> 
> Hi Marcelo,
> 
> Thanks for looking into this issue.  Although we tried your patches, we're still
> seeing the corruption.  If we execute block migration while copying a file
> locally, md5sum of the copied file doesn't match with the original.  Sometimes,
> the filesystem returns an I/O error.
> 
> Could you let us know how you tested and debugged?  Did you use blkverify?

Yoshiaki,

I first reproduced corruption by copying a large file during "migrate
-i", with shared base on qcow2 filesystem, as in your original report.

To debug the problem, file with different byte pattern at every 1MB
(size of dirty chunk) was created and copied directly to an IDE disk in
the guest. Raw format used for the disk image.

With this patchset, i'm not able to reproduce the original issue
anymore.

Can you please provide more details on how to reproduce?

> 
> Thanks,
> 
> Yoshi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] Re: [patch 3/3] block migration: do not submit multiple AIOs for same sector
  2010-11-09 12:42   ` [Qemu-devel] " Kevin Wolf
@ 2010-11-09 13:49     ` Marcelo Tosatti
  0 siblings, 0 replies; 14+ messages in thread
From: Marcelo Tosatti @ 2010-11-09 13:49 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Yoshiaki Tamura, Liran Schour, qemu-devel

On Tue, Nov 09, 2010 at 01:42:10PM +0100, Kevin Wolf wrote:
> > +static int bmds_aio_inflight(BlkMigDevState *bmds, int64_t sector)
> > +{
> > +    int64_t chunk = sector / (int64_t)BDRV_SECTORS_PER_DIRTY_CHUNK;
> > +
> > +    if (bmds->aio_bitmap &&
> 
> Is bmds->aio_bitmap ever supposed to be NULL?

Nope.

> > +        (sector << BDRV_SECTOR_BITS) < bdrv_getlength(bmds->bs)) {
> 
> Hm, the problem probably already exists without this patch, but I was
> wondering what happens if the result of bdrv_getlength() changes.
> 
> An image that is presented to the guest can't grow at the moment, even
> though I think there were some thoughts about live resizing. However, do
> we make sure that you can't close the image during block migration, by
> using eject, change or hot-unplugging a disk?

Don't see how that is the case (yes, it looks broken).

> > +        return !!(bmds->aio_bitmap[chunk / (sizeof(unsigned long) * 8)] &
> > +            (1UL << (chunk % (sizeof(unsigned long) * 8))));
> > +    } else {
> > +        return 0;
> > +    }
> > +}
> > +
> > +static void bmds_set_aio_inflight(BlkMigDevState *bmds, int64_t sector_num,
> > +                             int nb_sectors, int set)
> > +{
> > +    int64_t start, end;
> > +    unsigned long val, idx, bit;
> > +
> > +    start = sector_num / BDRV_SECTORS_PER_DIRTY_CHUNK;
> > +    end = (sector_num + nb_sectors - 1) / BDRV_SECTORS_PER_DIRTY_CHUNK;
> > +
> > +    for (; start <= end; start++) {
> > +        idx = start / (sizeof(unsigned long) * 8);
> > +        bit = start % (sizeof(unsigned long) * 8);
> > +        val = bmds->aio_bitmap[idx];
> > +        if (set) {
> > +            if (!(val & (1UL << bit))) {
> > +                val |= 1UL << bit;
> > +            }
> 
> Why the if? Just doing the |= unconditionally would be enough.
> 
> > +        } else {
> > +            if (val & (1UL << bit)) {
> > +                val &= ~(1UL << bit);
> > +            }
> 
> Same here.
> 
> > +        }
> > +        bmds->aio_bitmap[idx] = val;
> > +    }
> > +}
> > +
> > +static void alloc_aio_bitmap(BlkMigDevState *bmds)
> > +{
> > +    BlockDriverState *bs = bmds->bs;
> > +    int64_t bitmap_size;
> > +
> > +    bitmap_size = (bdrv_getlength(bs) >> BDRV_SECTOR_BITS) +
> > +            BDRV_SECTORS_PER_DIRTY_CHUNK * 8 - 1;
> > +    bitmap_size /= BDRV_SECTORS_PER_DIRTY_CHUNK * 8;
> > +
> > +    bmds->aio_bitmap = qemu_mallocz(bitmap_size);
> > +}
> > +
> >  static void blk_mig_read_cb(void *opaque, int ret)
> >  {
> >      BlkMigBlock *blk = opaque;
> > @@ -151,6 +204,7 @@ static void blk_mig_read_cb(void *opaque
> >      add_avg_read_time(blk->time);
> >  
> >      QSIMPLEQ_INSERT_TAIL(&block_mig_state.blk_list, blk, entry);
> > +    bmds_set_aio_inflight(blk->bmds, blk->sector, blk->nr_sectors, 0);
> >  
> >      block_mig_state.submitted--;
> >      block_mig_state.read_done++;
> > @@ -194,6 +248,7 @@ static int mig_save_device_bulk(Monitor 
> >      blk->buf = qemu_malloc(BLOCK_SIZE);
> >      blk->bmds = bmds;
> >      blk->sector = cur_sector;
> > +    blk->nr_sectors = nr_sectors;
> >  
> >      blk->iov.iov_base = blk->buf;
> >      blk->iov.iov_len = nr_sectors * BDRV_SECTOR_SIZE;
> > @@ -248,6 +303,7 @@ static void init_blk_migration_it(void *
> >          bmds->total_sectors = sectors;
> >          bmds->completed_sectors = 0;
> >          bmds->shared_base = block_mig_state.shared_base;
> > +        alloc_aio_bitmap(bmds);
> >  
> >          block_mig_state.total_sector_sum += sectors;
> >  
> > @@ -329,6 +385,8 @@ static int mig_save_device_dirty(Monitor
> >      int nr_sectors;
> >  
> >      for (sector = bmds->cur_dirty; sector < bmds->total_sectors;) {
> > +        if (bmds_aio_inflight(bmds, sector))
> > +            qemu_aio_flush();
> 
> Coding style requires braces.
> 
> Also, qemu_aio_flush() isn't a very nice solution. It works and fixes a
> bug, but it blocks everything until all requests (and the requests
> created by their callbacks) are completed. Can't we just skip this dirty
> sector and retry next time?

At some point we have to wait on the AIOs that are submitted (or remove
them from the list, which is much trickier). Note that currently it
does skip the request (and submit reads for all other unsubmitted dirty
blocks), before waiting.

So i don't see the need to optimize, yet.

> > +            blk->nr_sectors = nr_sectors;
> >  
> >              if (is_async) {
> >                  blk->iov.iov_base = blk->buf;
> > @@ -354,6 +413,7 @@ static int mig_save_device_dirty(Monitor
> >                      goto error;
> >                  }
> >                  block_mig_state.submitted++;
> > +                bmds_set_aio_inflight(bmds, sector, nr_sectors, 1);
> 
> Just to double-check if I understand right: mig_save_device_bulk doesn't
> set the bit, but that's okay because it would never try to submit
> overlapping requests anyway. Also copying the dirty regions only starts
> when no more bulk requests are in flight.
> 
> Correct?

Yes, correct.

Thanks for your comments, i'll update the patch.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] Re: [patch 0/3] block migration fixes
  2010-11-09 13:08   ` Marcelo Tosatti
@ 2010-11-11  9:06     ` Yoshiaki Tamura
  2010-11-12 15:56       ` Marcelo Tosatti
  0 siblings, 1 reply; 14+ messages in thread
From: Yoshiaki Tamura @ 2010-11-11  9:06 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Kevin Wolf, ohmura.kei, qemu-devel, Liran Schour

Marcelo Tosatti wrote:
> On Tue, Nov 09, 2010 at 03:02:12PM +0900, Yoshiaki Tamura wrote:
>> Marcelo Tosatti wrote:
>>> Following patchset fixes block migration corruption issues.
>>
>> Hi Marcelo,
>>
>> Thanks for looking into this issue.  Although we tried your patches, we're still
>> seeing the corruption.  If we execute block migration while copying a file
>> locally, md5sum of the copied file doesn't match with the original.  Sometimes,
>> the filesystem returns an I/O error.
>>
>> Could you let us know how you tested and debugged?  Did you use blkverify?
>
> Yoshiaki,
>
> I first reproduced corruption by copying a large file during "migrate
> -i", with shared base on qcow2 filesystem, as in your original report.
>
> To debug the problem, file with different byte pattern at every 1MB
> (size of dirty chunk) was created and copied directly to an IDE disk in
> the guest. Raw format used for the disk image.
>
> With this patchset, i'm not able to reproduce the original issue
> anymore.
>
> Can you please provide more details on how to reproduce?

Marcelo,

We double checked and the patchset does seem to fix the problem.  The was a 
mistake in our test procedure.  Sorry for the confusion.

Thanks,

Yoshi

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [Qemu-devel] Re: [patch 0/3] block migration fixes
  2010-11-11  9:06     ` Yoshiaki Tamura
@ 2010-11-12 15:56       ` Marcelo Tosatti
  0 siblings, 0 replies; 14+ messages in thread
From: Marcelo Tosatti @ 2010-11-12 15:56 UTC (permalink / raw)
  To: Yoshiaki Tamura; +Cc: Kevin Wolf, ohmura.kei, qemu-devel, Liran Schour

On Thu, Nov 11, 2010 at 06:06:02PM +0900, Yoshiaki Tamura wrote:
> Marcelo Tosatti wrote:
> >On Tue, Nov 09, 2010 at 03:02:12PM +0900, Yoshiaki Tamura wrote:
> >>Marcelo Tosatti wrote:
> >>>Following patchset fixes block migration corruption issues.
> >>
> >>Hi Marcelo,
> >>
> >>Thanks for looking into this issue.  Although we tried your patches, we're still
> >>seeing the corruption.  If we execute block migration while copying a file
> >>locally, md5sum of the copied file doesn't match with the original.  Sometimes,
> >>the filesystem returns an I/O error.
> >>
> >>Could you let us know how you tested and debugged?  Did you use blkverify?
> >
> >Yoshiaki,
> >
> >I first reproduced corruption by copying a large file during "migrate
> >-i", with shared base on qcow2 filesystem, as in your original report.
> >
> >To debug the problem, file with different byte pattern at every 1MB
> >(size of dirty chunk) was created and copied directly to an IDE disk in
> >the guest. Raw format used for the disk image.
> >
> >With this patchset, i'm not able to reproduce the original issue
> >anymore.
> >
> >Can you please provide more details on how to reproduce?
> 
> Marcelo,
> 
> We double checked and the patchset does seem to fix the problem.
> The was a mistake in our test procedure.  Sorry for the confusion.
> 
> Thanks,
> 
> Yoshi

I was also experiencing corruption on automated test, but it turned out
to be fixed by kvm.git's ae8894c00b560bde4.

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [patch 0/3] block migration fixes
  2010-11-08 19:02 [Qemu-devel] [patch 0/3] block migration fixes Marcelo Tosatti
                   ` (3 preceding siblings ...)
  2010-11-09  6:02 ` [Qemu-devel] Re: [patch 0/3] block migration fixes Yoshiaki Tamura
@ 2010-11-21 15:22 ` Anthony Liguori
  2010-11-22  9:43   ` Kevin Wolf
  4 siblings, 1 reply; 14+ messages in thread
From: Anthony Liguori @ 2010-11-21 15:22 UTC (permalink / raw)
  To: Marcelo Tosatti; +Cc: Kevin Wolf, Yoshiaki Tamura, Liran Schour, qemu-devel

On 11/08/2010 01:02 PM, Marcelo Tosatti wrote:
> Following patchset fixes block migration corruption issues
>    

Applied all.  Thanks.

Regards,

Anthony Liguori

>
>    

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [patch 0/3] block migration fixes
  2010-11-21 15:22 ` [Qemu-devel] " Anthony Liguori
@ 2010-11-22  9:43   ` Kevin Wolf
  0 siblings, 0 replies; 14+ messages in thread
From: Kevin Wolf @ 2010-11-22  9:43 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Marcelo Tosatti, Yoshiaki Tamura, Liran Schour, qemu-devel

Am 21.11.2010 16:22, schrieb Anthony Liguori:
> On 11/08/2010 01:02 PM, Marcelo Tosatti wrote:
>> Following patchset fixes block migration corruption issues
>>    
> 
> Applied all.  Thanks.

This was the old version. Anyway, I'll include a diff between v1 and v2
in my next pull request.

Kevin

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2010-11-22  9:43 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-08 19:02 [Qemu-devel] [patch 0/3] block migration fixes Marcelo Tosatti
2010-11-08 19:02 ` [Qemu-devel] [patch 1/3] block: fix shift in dirty bitmap calculation Marcelo Tosatti
2010-11-09 12:02   ` [Qemu-devel] " Kevin Wolf
2010-11-08 19:02 ` [Qemu-devel] [patch 2/3] block: set sector dirty on AIO write completion Marcelo Tosatti
2010-11-09 12:02   ` [Qemu-devel] " Kevin Wolf
2010-11-08 19:02 ` [Qemu-devel] [patch 3/3] block migration: do not submit multiple AIOs for same sector Marcelo Tosatti
2010-11-09 12:42   ` [Qemu-devel] " Kevin Wolf
2010-11-09 13:49     ` Marcelo Tosatti
2010-11-09  6:02 ` [Qemu-devel] Re: [patch 0/3] block migration fixes Yoshiaki Tamura
2010-11-09 13:08   ` Marcelo Tosatti
2010-11-11  9:06     ` Yoshiaki Tamura
2010-11-12 15:56       ` Marcelo Tosatti
2010-11-21 15:22 ` [Qemu-devel] " Anthony Liguori
2010-11-22  9:43   ` Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.