[Qemu-devel] [PATCH 0/5] block migration fixes

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 0/5] block migration fixes
@ 2018-03-08 11:18 Peter Lieven
  2018-03-08 11:18 ` [Qemu-devel] [PATCH 1/5] migration: do not transfer ram during bulk storage migration Peter Lieven
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Peter Lieven @ 2018-03-08 11:18 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: dgilbert, quintela, famz, stefanha, jjherne, Peter Lieven

Peter Lieven (5):
  migration: do not transfer ram during bulk storage migration
  migration/block: reset dirty bitmap before read in bulk phase
  migration/block: rename MAX_INFLIGHT_IO to MAX_IO_BUFFERS
  migration/block: limit the number of parallel I/O requests
  migration/block: compare only read blocks against the rate limiter

 migration/block.c | 17 ++++++++---------
 migration/ram.c   |  8 ++++++++
 2 files changed, 16 insertions(+), 9 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH 1/5] migration: do not transfer ram during bulk storage migration
  2018-03-08 11:18 [Qemu-devel] [PATCH 0/5] block migration fixes Peter Lieven
@ 2018-03-08 11:18 ` Peter Lieven
  2018-03-08 11:27   ` Juan Quintela
  2018-03-08 11:18 ` [Qemu-devel] [PATCH 2/5] migration/block: reset dirty bitmap before read in bulk phase Peter Lieven
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Peter Lieven @ 2018-03-08 11:18 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: dgilbert, quintela, famz, stefanha, jjherne, Peter Lieven

this patch makes the bulk phase of a block migration to take
place before we start transferring ram. As the bulk block migration
can take a long time its pointless to transfer ram during that phase.

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 migration/ram.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/migration/ram.c b/migration/ram.c
index 5e33e5c..42468ee 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2258,6 +2258,13 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
     int64_t t0;
     int done = 0;
 
+    if (blk_mig_bulk_active()) {
+        /* Avoid transferring ram during bulk phase of block migration as
+         * the bulk phase will usually take a long time and transferring
+         * ram updates during that time is pointless. */
+        goto out;
+    }
+
     rcu_read_lock();
     if (ram_list.version != rs->last_version) {
         ram_state_reset(rs);
@@ -2304,6 +2311,7 @@ static int ram_save_iterate(QEMUFile *f, void *opaque)
      */
     ram_control_after_iterate(f, RAM_CONTROL_ROUND);
 
+out:
     qemu_put_be64(f, RAM_SAVE_FLAG_EOS);
     ram_counters.transferred += 8;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH 2/5] migration/block: reset dirty bitmap before read in bulk phase
  2018-03-08 11:18 [Qemu-devel] [PATCH 0/5] block migration fixes Peter Lieven
  2018-03-08 11:18 ` [Qemu-devel] [PATCH 1/5] migration: do not transfer ram during bulk storage migration Peter Lieven
@ 2018-03-08 11:18 ` Peter Lieven
  2018-03-08 11:39   ` Juan Quintela
  2018-03-08 11:18 ` [Qemu-devel] [PATCH 3/5] migration/block: rename MAX_INFLIGHT_IO to MAX_IO_BUFFERS Peter Lieven
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 17+ messages in thread
From: Peter Lieven @ 2018-03-08 11:18 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: dgilbert, quintela, famz, stefanha, jjherne, Peter Lieven, qemu-stable

Reset the dirty bitmap before reading to make sure we don't miss
any new data.

Cc: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
---
 migration/block.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/migration/block.c b/migration/block.c
index 1f03946..87bb35c 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -331,11 +331,10 @@ static int mig_save_device_bulk(QEMUFile *f, BlkMigDevState *bmds)
      */
     qemu_mutex_lock_iothread();
     aio_context_acquire(blk_get_aio_context(bmds->blk));
-    blk->aiocb = blk_aio_preadv(bb, cur_sector * BDRV_SECTOR_SIZE, &blk->qiov,
-                                0, blk_mig_read_cb, blk);
-
     bdrv_reset_dirty_bitmap(bmds->dirty_bitmap, cur_sector * BDRV_SECTOR_SIZE,
                             nr_sectors * BDRV_SECTOR_SIZE);
+    blk->aiocb = blk_aio_preadv(bb, cur_sector * BDRV_SECTOR_SIZE, &blk->qiov,
+                                0, blk_mig_read_cb, blk);
     aio_context_release(blk_get_aio_context(bmds->blk));
     qemu_mutex_unlock_iothread();
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH 3/5] migration/block: rename MAX_INFLIGHT_IO to MAX_IO_BUFFERS
  2018-03-08 11:18 [Qemu-devel] [PATCH 0/5] block migration fixes Peter Lieven
  2018-03-08 11:18 ` [Qemu-devel] [PATCH 1/5] migration: do not transfer ram during bulk storage migration Peter Lieven
  2018-03-08 11:18 ` [Qemu-devel] [PATCH 2/5] migration/block: reset dirty bitmap before read in bulk phase Peter Lieven
@ 2018-03-08 11:18 ` Peter Lieven
  2018-03-08 11:31   ` Juan Quintela
  2018-03-09 14:58   ` Dr. David Alan Gilbert
  2018-03-08 11:18 ` [Qemu-devel] [PATCH 4/5] migration/block: limit the number of parallel I/O requests Peter Lieven
  2018-03-08 11:18 ` [Qemu-devel] [PATCH 5/5] migration/block: compare only read blocks against the rate limiter Peter Lieven
  4 siblings, 2 replies; 17+ messages in thread
From: Peter Lieven @ 2018-03-08 11:18 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: dgilbert, quintela, famz, stefanha, jjherne, Peter Lieven

this actually limits (as the original commit mesage suggests) the
number of I/O buffers that can be allocated and not the number
of parallel (inflight) I/O requests.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 migration/block.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/migration/block.c b/migration/block.c
index 87bb35c..41b95d1 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -36,7 +36,7 @@
 
 #define MAX_IS_ALLOCATED_SEARCH (65536 * BDRV_SECTOR_SIZE)
 
-#define MAX_INFLIGHT_IO 512
+#define MAX_IO_BUFFERS 512
 
 //#define DEBUG_BLK_MIGRATION
 
@@ -775,9 +775,8 @@ static int block_save_iterate(QEMUFile *f, void *opaque)
     while ((block_mig_state.submitted +
             block_mig_state.read_done) * BLOCK_SIZE <
            qemu_file_get_rate_limit(f) &&
-           (block_mig_state.submitted +
-            block_mig_state.read_done) <
-           MAX_INFLIGHT_IO) {
+           (block_mig_state.submitted + block_mig_state.read_done) <
+           MAX_IO_BUFFERS) {
         blk_mig_unlock();
         if (block_mig_state.bulk_completed == 0) {
             /* first finish the bulk phase */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH 4/5] migration/block: limit the number of parallel I/O requests
  2018-03-08 11:18 [Qemu-devel] [PATCH 0/5] block migration fixes Peter Lieven
                   ` (2 preceding siblings ...)
  2018-03-08 11:18 ` [Qemu-devel] [PATCH 3/5] migration/block: rename MAX_INFLIGHT_IO to MAX_IO_BUFFERS Peter Lieven
@ 2018-03-08 11:18 ` Peter Lieven
  2018-03-08 12:50   ` Juan Quintela
                     ` (2 more replies)
  2018-03-08 11:18 ` [Qemu-devel] [PATCH 5/5] migration/block: compare only read blocks against the rate limiter Peter Lieven
  4 siblings, 3 replies; 17+ messages in thread
From: Peter Lieven @ 2018-03-08 11:18 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: dgilbert, quintela, famz, stefanha, jjherne, Peter Lieven

the current implementation submits up to 512 I/O requests in parallel
which is much to high especially for a background task.
This patch adds a maximum limit of 16 I/O requests that can
be submitted in parallel to avoid monopolizing the I/O device.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 migration/block.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/migration/block.c b/migration/block.c
index 41b95d1..ce939e2 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -37,6 +37,7 @@
 #define MAX_IS_ALLOCATED_SEARCH (65536 * BDRV_SECTOR_SIZE)
 
 #define MAX_IO_BUFFERS 512
+#define MAX_PARALLEL_IO 16
 
 //#define DEBUG_BLK_MIGRATION
 
@@ -775,6 +776,7 @@ static int block_save_iterate(QEMUFile *f, void *opaque)
     while ((block_mig_state.submitted +
             block_mig_state.read_done) * BLOCK_SIZE <
            qemu_file_get_rate_limit(f) &&
+           block_mig_state.submitted < MAX_PARALLEL_IO &&
            (block_mig_state.submitted + block_mig_state.read_done) <
            MAX_IO_BUFFERS) {
         blk_mig_unlock();
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [Qemu-devel] [PATCH 5/5] migration/block: compare only read blocks against the rate limiter
  2018-03-08 11:18 [Qemu-devel] [PATCH 0/5] block migration fixes Peter Lieven
                   ` (3 preceding siblings ...)
  2018-03-08 11:18 ` [Qemu-devel] [PATCH 4/5] migration/block: limit the number of parallel I/O requests Peter Lieven
@ 2018-03-08 11:18 ` Peter Lieven
  2018-03-20 18:02   ` Juan Quintela
  4 siblings, 1 reply; 17+ messages in thread
From: Peter Lieven @ 2018-03-08 11:18 UTC (permalink / raw)
  To: qemu-devel, qemu-block
  Cc: dgilbert, quintela, famz, stefanha, jjherne, Peter Lieven

only read_done blocks are in the queued to be flushed to the migration
stream. submitted blocks are still in flight.

Signed-off-by: Peter Lieven <pl@kamp.de>
---
 migration/block.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/migration/block.c b/migration/block.c
index ce939e2..4e950c2 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -773,8 +773,7 @@ static int block_save_iterate(QEMUFile *f, void *opaque)
 
     /* control the rate of transfer */
     blk_mig_lock();
-    while ((block_mig_state.submitted +
-            block_mig_state.read_done) * BLOCK_SIZE <
+    while (block_mig_state.read_done * BLOCK_SIZE <
            qemu_file_get_rate_limit(f) &&
            block_mig_state.submitted < MAX_PARALLEL_IO &&
            (block_mig_state.submitted + block_mig_state.read_done) <
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 1/5] migration: do not transfer ram during bulk storage migration
  2018-03-08 11:18 ` [Qemu-devel] [PATCH 1/5] migration: do not transfer ram during bulk storage migration Peter Lieven
@ 2018-03-08 11:27   ` Juan Quintela
  0 siblings, 0 replies; 17+ messages in thread
From: Juan Quintela @ 2018-03-08 11:27 UTC (permalink / raw)
  To: Peter Lieven; +Cc: qemu-devel, qemu-block, dgilbert, famz, stefanha, jjherne

Peter Lieven <pl@kamp.de> wrote:
> this patch makes the bulk phase of a block migration to take
> place before we start transferring ram. As the bulk block migration
> can take a long time its pointless to transfer ram during that phase.
>
> Signed-off-by: Peter Lieven <pl@kamp.de>
> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 3/5] migration/block: rename MAX_INFLIGHT_IO to MAX_IO_BUFFERS
  2018-03-08 11:18 ` [Qemu-devel] [PATCH 3/5] migration/block: rename MAX_INFLIGHT_IO to MAX_IO_BUFFERS Peter Lieven
@ 2018-03-08 11:31   ` Juan Quintela
  2018-03-09 14:58   ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 17+ messages in thread
From: Juan Quintela @ 2018-03-08 11:31 UTC (permalink / raw)
  To: Peter Lieven; +Cc: qemu-devel, qemu-block, dgilbert, famz, stefanha, jjherne

Peter Lieven <pl@kamp.de> wrote:
> this actually limits (as the original commit mesage suggests) the
> number of I/O buffers that can be allocated and not the number
> of parallel (inflight) I/O requests.
>
> Signed-off-by: Peter Lieven <pl@kamp.de>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 2/5] migration/block: reset dirty bitmap before read in bulk phase
  2018-03-08 11:18 ` [Qemu-devel] [PATCH 2/5] migration/block: reset dirty bitmap before read in bulk phase Peter Lieven
@ 2018-03-08 11:39   ` Juan Quintela
  0 siblings, 0 replies; 17+ messages in thread
From: Juan Quintela @ 2018-03-08 11:39 UTC (permalink / raw)
  To: Peter Lieven
  Cc: qemu-devel, qemu-block, dgilbert, famz, stefanha, jjherne, qemu-stable

Peter Lieven <pl@kamp.de> wrote:
> Reset the dirty bitmap before reading to make sure we don't miss
> any new data.
>
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Peter Lieven <pl@kamp.de>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 4/5] migration/block: limit the number of parallel I/O requests
  2018-03-08 11:18 ` [Qemu-devel] [PATCH 4/5] migration/block: limit the number of parallel I/O requests Peter Lieven
@ 2018-03-08 12:50   ` Juan Quintela
  2018-03-08 13:30     ` Peter Lieven
  2018-03-20 18:02   ` Juan Quintela
  2018-03-23 16:45   ` Dr. David Alan Gilbert
  2 siblings, 1 reply; 17+ messages in thread
From: Juan Quintela @ 2018-03-08 12:50 UTC (permalink / raw)
  To: Peter Lieven; +Cc: qemu-devel, qemu-block, dgilbert, famz, stefanha, jjherne

Peter Lieven <pl@kamp.de> wrote:
> the current implementation submits up to 512 I/O requests in parallel
> which is much to high especially for a background task.
> This patch adds a maximum limit of 16 I/O requests that can
> be submitted in parallel to avoid monopolizing the I/O device.
>
> Signed-off-by: Peter Lieven <pl@kamp.de>

This code is already a mess (TM).  It is difficult to understand what we
are doing, so not easy to see if your changes are better/worse than what
we have.

I am not sure that your solution help to improve things here.  Let's see
what I understand.

We have three fields (without a single comment):

- submitted: this is the number of blocks that we have asked the block
             device to read asynchronously to main memory, and that
             haven't yet read.  I.e. "blocks_read_pending" would be a
             better name?

- read_done: this is the number of blocks that we have finished read
             asynchronously from this block device.  When we finish, we
             do a submitted -- and a read_done++.  blocks_read_finished
             name?

- transferred: This is the number of blocks that we have transferred
               since the beginning of migration.  At this point, we do a
               read_done-- and a transferred++

Note also that we do malloc()/free() for each block

So, now that we have defined what our fields mean, we need to know what
is our test.  block_save_pending():

    get_remaining_dirty() + (submitted + read_done) * BLOCK_SIZE

Looks good.

But let's us see what test we do in block_save_iterate() (Yes, I have
been very liberal with reformatting and removal of struct names):

(submitted + read_done) * BLOCK_SIZE <  qemu_file_get_rate_limit(f) &&
(submitted + read_done) < MAX_INFLIGHT_IO

The idea of that test is to make sure that we _don't send_ through the
QEMUFile more than qemu_file_get_rate_limit(f).  But there are several
things here:
- we have already issued a flush_blks() call before we enter the loop
  And it is inside possibility that we have already sent too much data at
  this point, but we enter the while loop anyways.

  Notice that flush_blks() does the right thing and test in each loop if
  qemu_file_rate_limit() has been reached and stops sending more data if
  it returns true;

- At this point, we *could* have sent all that can be sent for this
  round, but we enter the loop anyways.  And we test two things:
     - that we haven't read from block devices more than
       qemu_file_get_rate_limit() bytes (notice that this is the maximum
       that we could put through the migration channel, not really
       related  with what we read from block devices).

     - that we haven't read in this round more than MAX_INFLIGHT_IO
       blocks.  That is 512 blocks, at BLOCK_SIZE bytes is 512MB. Why?
       Who knows.

- Now we exit the while loop, and we have pending blocks to send, the
    minimum between:
       - qemu_file_get_rate_limit()/BLOCK_SIZE, and
       - MAX_INFLIGHT_IO

But not all of them are ready to send, only "read_done" blocks are
ready, the "submitted" ones are still waiting for read completion.

And we call back flush_blks() that will try to send all the "read_done"
blocks through the migration channel until we hit
qemu_file_rate_limit().

So, looking at the problem from far away:

- how many read requests are we have to have in flight at any moment, is
  that 16 from this series the right number? Notice that each request
  are 1MB, so this is 16MB (I have no clue what is the right value).

- how many blocks should we get on each round.  Notice that the reason
  for the 1st patch on this series is because the block layer is not
  sending enough blocks to prevent ram migration to start.  If there are
  enough dirty memory sent from the block layer, we shouldn't ever enter
  the ram stage.
  Notice how we register them in vl.c:

    blk_mig_init();
    ram_mig_init();

So, after so long mail, do I have some suggestion?

- should we make the MAX_PARALLEL_IO autotunig?  i.e. if we are not
  being able to get qemu_file_get_rate_limit()/BLOCK_SIZE read_blocks by
  iteration, we should increase MAX_PARALLEL_IO limit?

- should we "take" into account how many blocks have transferred the 1st
  call to flush_blks() and only wait for "read_blocks" until
  flush_blks() instead of for the whole set?

Later, Juan.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 4/5] migration/block: limit the number of parallel I/O requests
  2018-03-08 12:50   ` Juan Quintela
@ 2018-03-08 13:30     ` Peter Lieven
  2018-03-20 13:15       ` Peter Lieven
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Lieven @ 2018-03-08 13:30 UTC (permalink / raw)
  To: quintela; +Cc: qemu-devel, qemu-block, dgilbert, famz, stefanha, jjherne

Am 08.03.2018 um 13:50 schrieb Juan Quintela:
> Peter Lieven <pl@kamp.de> wrote:
>> the current implementation submits up to 512 I/O requests in parallel
>> which is much to high especially for a background task.
>> This patch adds a maximum limit of 16 I/O requests that can
>> be submitted in parallel to avoid monopolizing the I/O device.
>>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
> This code is already a mess (TM).  It is difficult to understand what we
> are doing, so not easy to see if your changes are better/worse than what
> we have.
>
> I am not sure that your solution help to improve things here.  Let's see
> what I understand.
>
> We have three fields (without a single comment):
>
> - submitted: this is the number of blocks that we have asked the block
>               device to read asynchronously to main memory, and that
>               haven't yet read.  I.e. "blocks_read_pending" would be a
>               better name?
>
> - read_done: this is the number of blocks that we have finished read
>               asynchronously from this block device.  When we finish, we
>               do a submitted -- and a read_done++.  blocks_read_finished
>               name?

Correct. The names should be adjusted.

>
> - transferred: This is the number of blocks that we have transferred
>                 since the beginning of migration.  At this point, we do a
>                 read_done-- and a transferred++
>
> Note also that we do malloc()/free() for each block

Yes, but that is a different story.

>
> So, now that we have defined what our fields mean, we need to know what
> is our test.  block_save_pending():
>
>      get_remaining_dirty() + (submitted + read_done) * BLOCK_SIZE
>
> Looks good.
>
> But let's us see what test we do in block_save_iterate() (Yes, I have
> been very liberal with reformatting and removal of struct names):
>
> (submitted + read_done) * BLOCK_SIZE <  qemu_file_get_rate_limit(f) &&
> (submitted + read_done) < MAX_INFLIGHT_IO
>
> The idea of that test is to make sure that we _don't send_ through the
> QEMUFile more than qemu_file_get_rate_limit(f).  But there are several
> things here:
> - we have already issued a flush_blks() call before we enter the loop
>    And it is inside possibility that we have already sent too much data at
>    this point, but we enter the while loop anyways.
>
>    Notice that flush_blks() does the right thing and test in each loop if
>    qemu_file_rate_limit() has been reached and stops sending more data if
>    it returns true;
>
> - At this point, we *could* have sent all that can be sent for this
>    round, but we enter the loop anyways.  And we test two things:
>       - that we haven't read from block devices more than
>         qemu_file_get_rate_limit() bytes (notice that this is the maximum
>         that we could put through the migration channel, not really
>         related  with what we read from block devices).
>
>       - that we haven't read in this round more than MAX_INFLIGHT_IO
>         blocks.  That is 512 blocks, at BLOCK_SIZE bytes is 512MB. Why?
>         Who knows.

The idea behind this was only to limit the maximum memory that is allocated.
That was not meant as a rate limit for the storage backend.

>
> - Now we exit the while loop, and we have pending blocks to send, the
>      minimum between:
>         - qemu_file_get_rate_limit()/BLOCK_SIZE, and
>         - MAX_INFLIGHT_IO
>
> But not all of them are ready to send, only "read_done" blocks are
> ready, the "submitted" ones are still waiting for read completion.

Thats what I tried to address in patch 5.

>
> And we call back flush_blks() that will try to send all the "read_done"
> blocks through the migration channel until we hit
> qemu_file_rate_limit().
>
> So, looking at the problem from far away:
>
> - how many read requests are we have to have in flight at any moment, is
>    that 16 from this series the right number? Notice that each request
>    are 1MB, so this is 16MB (I have no clue what is the right value).

I choosed that value because it helped to improved the stalls in the
guest that I have been seeing. Stefan also said that 16 would be a good
value for a background task. 512 is definetly too much.

>
> - how many blocks should we get on each round.  Notice that the reason
>    for the 1st patch on this series is because the block layer is not
>    sending enough blocks to prevent ram migration to start.  If there are
>    enough dirty memory sent from the block layer, we shouldn't ever enter
>    the ram stage.
>    Notice how we register them in vl.c:
>
>      blk_mig_init();
>      ram_mig_init();

The idea of the 1st patch was to skip ram migration until we have completed
the first round of block migration (aka bulk phase) as this will take a long time.
After that we are only doing incremental updates. You are right this might still
be too early, but it to start after the bulk phase was an easy choice without
introducing another heuristic.

>
> So, after so long mail, do I have some suggestion?
>
> - should we make the MAX_PARALLEL_IO autotunig?  i.e. if we are not
>    being able to get qemu_file_get_rate_limit()/BLOCK_SIZE read_blocks by
>    iteration, we should increase MAX_PARALLEL_IO limit?

We should not, if the bandwidth of the migration stream is high we would again
monipolize the I/O device and get stalls in guest I/O.

>
> - should we "take" into account how many blocks have transferred the 1st
>    call to flush_blks() and only wait for "read_blocks" until
>    flush_blks() instead of for the whole set?

We still have to avoid that we have too much parallel I/O. But I was also thinking that
we need a time limit (as the ram migration has). We might sit in the while loop until
512MB have read if we have a fast source storage and a high migration bandwidth
so that we never reach the rate limit.

Peter

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 3/5] migration/block: rename MAX_INFLIGHT_IO to MAX_IO_BUFFERS
  2018-03-08 11:18 ` [Qemu-devel] [PATCH 3/5] migration/block: rename MAX_INFLIGHT_IO to MAX_IO_BUFFERS Peter Lieven
  2018-03-08 11:31   ` Juan Quintela
@ 2018-03-09 14:58   ` Dr. David Alan Gilbert
  2018-03-09 19:57     ` Peter Lieven
  1 sibling, 1 reply; 17+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-09 14:58 UTC (permalink / raw)
  To: Peter Lieven; +Cc: qemu-devel, qemu-block, quintela, famz, stefanha, jjherne

* Peter Lieven (pl@kamp.de) wrote:
> this actually limits (as the original commit mesage suggests) the
> number of I/O buffers that can be allocated and not the number
> of parallel (inflight) I/O requests.
> 
> Signed-off-by: Peter Lieven <pl@kamp.de>

I've queued 1-3 (which have Juan's R-b).

Dave

> ---
>  migration/block.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/migration/block.c b/migration/block.c
> index 87bb35c..41b95d1 100644
> --- a/migration/block.c
> +++ b/migration/block.c
> @@ -36,7 +36,7 @@
>  
>  #define MAX_IS_ALLOCATED_SEARCH (65536 * BDRV_SECTOR_SIZE)
>  
> -#define MAX_INFLIGHT_IO 512
> +#define MAX_IO_BUFFERS 512
>  
>  //#define DEBUG_BLK_MIGRATION
>  
> @@ -775,9 +775,8 @@ static int block_save_iterate(QEMUFile *f, void *opaque)
>      while ((block_mig_state.submitted +
>              block_mig_state.read_done) * BLOCK_SIZE <
>             qemu_file_get_rate_limit(f) &&
> -           (block_mig_state.submitted +
> -            block_mig_state.read_done) <
> -           MAX_INFLIGHT_IO) {
> +           (block_mig_state.submitted + block_mig_state.read_done) <
> +           MAX_IO_BUFFERS) {
>          blk_mig_unlock();
>          if (block_mig_state.bulk_completed == 0) {
>              /* first finish the bulk phase */
> -- 
> 2.7.4
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 3/5] migration/block: rename MAX_INFLIGHT_IO to MAX_IO_BUFFERS
  2018-03-09 14:58   ` Dr. David Alan Gilbert
@ 2018-03-09 19:57     ` Peter Lieven
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Lieven @ 2018-03-09 19:57 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: qemu-devel, qemu-block, quintela, famz, stefanha, jjherne

Am 09.03.2018 um 15:58 schrieb Dr. David Alan Gilbert:
> * Peter Lieven (pl@kamp.de) wrote:
>> this actually limits (as the original commit mesage suggests) the
>> number of I/O buffers that can be allocated and not the number
>> of parallel (inflight) I/O requests.
>>
>> Signed-off-by: Peter Lieven <pl@kamp.de>
> I've queued 1-3 (which have Juan's R-b).

Thank you. It would be good if we find a proper solution for the rest as this is also important.

Peter

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 4/5] migration/block: limit the number of parallel I/O requests
  2018-03-08 13:30     ` Peter Lieven
@ 2018-03-20 13:15       ` Peter Lieven
  0 siblings, 0 replies; 17+ messages in thread
From: Peter Lieven @ 2018-03-20 13:15 UTC (permalink / raw)
  To: quintela; +Cc: qemu-devel, qemu-block, dgilbert, famz, stefanha

Am 08.03.2018 um 14:30 schrieb Peter Lieven:
> Am 08.03.2018 um 13:50 schrieb Juan Quintela:
>> Peter Lieven <pl@kamp.de> wrote:
>>> the current implementation submits up to 512 I/O requests in parallel
>>> which is much to high especially for a background task.
>>> This patch adds a maximum limit of 16 I/O requests that can
>>> be submitted in parallel to avoid monopolizing the I/O device.
>>>
>>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> This code is already a mess (TM).  It is difficult to understand what we
>> are doing, so not easy to see if your changes are better/worse than what
>> we have.
>>
>> I am not sure that your solution help to improve things here. Let's see
>> what I understand.
>>
>> We have three fields (without a single comment):
>>
>> - submitted: this is the number of blocks that we have asked the block
>>               device to read asynchronously to main memory, and that
>>               haven't yet read.  I.e. "blocks_read_pending" would be a
>>               better name?
>>
>> - read_done: this is the number of blocks that we have finished read
>>               asynchronously from this block device.  When we finish, we
>>               do a submitted -- and a read_done++. blocks_read_finished
>>               name?
>
> Correct. The names should be adjusted.
>
>>
>> - transferred: This is the number of blocks that we have transferred
>>                 since the beginning of migration.  At this point, we do a
>>                 read_done-- and a transferred++
>>
>> Note also that we do malloc()/free() for each block
>
> Yes, but that is a different story.
>
>>
>> So, now that we have defined what our fields mean, we need to know what
>> is our test.  block_save_pending():
>>
>>      get_remaining_dirty() + (submitted + read_done) * BLOCK_SIZE
>>
>> Looks good.
>>
>> But let's us see what test we do in block_save_iterate() (Yes, I have
>> been very liberal with reformatting and removal of struct names):
>>
>> (submitted + read_done) * BLOCK_SIZE < qemu_file_get_rate_limit(f) &&
>> (submitted + read_done) < MAX_INFLIGHT_IO
>>
>> The idea of that test is to make sure that we _don't send_ through the
>> QEMUFile more than qemu_file_get_rate_limit(f).  But there are several
>> things here:
>> - we have already issued a flush_blks() call before we enter the loop
>>    And it is inside possibility that we have already sent too much data at
>>    this point, but we enter the while loop anyways.
>>
>>    Notice that flush_blks() does the right thing and test in each loop if
>>    qemu_file_rate_limit() has been reached and stops sending more data if
>>    it returns true;
>>
>> - At this point, we *could* have sent all that can be sent for this
>>    round, but we enter the loop anyways.  And we test two things:
>>       - that we haven't read from block devices more than
>>         qemu_file_get_rate_limit() bytes (notice that this is the maximum
>>         that we could put through the migration channel, not really
>>         related  with what we read from block devices).
>>
>>       - that we haven't read in this round more than MAX_INFLIGHT_IO
>>         blocks.  That is 512 blocks, at BLOCK_SIZE bytes is 512MB. Why?
>>         Who knows.
>
> The idea behind this was only to limit the maximum memory that is allocated.
> That was not meant as a rate limit for the storage backend.
>
>>
>> - Now we exit the while loop, and we have pending blocks to send, the
>>      minimum between:
>>         - qemu_file_get_rate_limit()/BLOCK_SIZE, and
>>         - MAX_INFLIGHT_IO
>>
>> But not all of them are ready to send, only "read_done" blocks are
>> ready, the "submitted" ones are still waiting for read completion.
>
> Thats what I tried to address in patch 5.
>
>>
>> And we call back flush_blks() that will try to send all the "read_done"
>> blocks through the migration channel until we hit
>> qemu_file_rate_limit().
>>
>> So, looking at the problem from far away:
>>
>> - how many read requests are we have to have in flight at any moment, is
>>    that 16 from this series the right number? Notice that each request
>>    are 1MB, so this is 16MB (I have no clue what is the right value).
>
> I choosed that value because it helped to improved the stalls in the
> guest that I have been seeing. Stefan also said that 16 would be a good
> value for a background task. 512 is definetly too much.
>
>>
>> - how many blocks should we get on each round.  Notice that the reason
>>    for the 1st patch on this series is because the block layer is not
>>    sending enough blocks to prevent ram migration to start.  If there are
>>    enough dirty memory sent from the block layer, we shouldn't ever enter
>>    the ram stage.
>>    Notice how we register them in vl.c:
>>
>>      blk_mig_init();
>>      ram_mig_init();
>
> The idea of the 1st patch was to skip ram migration until we have completed
> the first round of block migration (aka bulk phase) as this will take a long time.
> After that we are only doing incremental updates. You are right this might still
> be too early, but it to start after the bulk phase was an easy choice without
> introducing another heuristic.
>
>>
>> So, after so long mail, do I have some suggestion?
>>
>> - should we make the MAX_PARALLEL_IO autotunig?  i.e. if we are not
>>    being able to get qemu_file_get_rate_limit()/BLOCK_SIZE read_blocks by
>>    iteration, we should increase MAX_PARALLEL_IO limit?
>
> We should not, if the bandwidth of the migration stream is high we would again
> monipolize the I/O device and get stalls in guest I/O.
>
>>
>> - should we "take" into account how many blocks have transferred the 1st
>>    call to flush_blks() and only wait for "read_blocks" until
>>    flush_blks() instead of for the whole set?
>
> We still have to avoid that we have too much parallel I/O. But I was also thinking that
> we need a time limit (as the ram migration has). We might sit in the while loop until
> 512MB have read if we have a fast source storage and a high migration bandwidth
> so that we never reach the rate limit.
>
> Peter

Hi,

any idea how we can continue with this? I can prepare a new series with a patch that adjusts
the naming of the variables and hope this makes things clearer.

Thanks,
Peter

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 4/5] migration/block: limit the number of parallel I/O requests
  2018-03-08 11:18 ` [Qemu-devel] [PATCH 4/5] migration/block: limit the number of parallel I/O requests Peter Lieven
  2018-03-08 12:50   ` Juan Quintela
@ 2018-03-20 18:02   ` Juan Quintela
  2018-03-23 16:45   ` Dr. David Alan Gilbert
  2 siblings, 0 replies; 17+ messages in thread
From: Juan Quintela @ 2018-03-20 18:02 UTC (permalink / raw)
  To: Peter Lieven; +Cc: qemu-devel, qemu-block, dgilbert, famz, stefanha, jjherne

Peter Lieven <pl@kamp.de> wrote:
> the current implementation submits up to 512 I/O requests in parallel
> which is much to high especially for a background task.
> This patch adds a maximum limit of 16 I/O requests that can
> be submitted in parallel to avoid monopolizing the I/O device.
>
> Signed-off-by: Peter Lieven <pl@kamp.de>

Reviewed-by: Juan Quintela <quintela@redhat.com>

PD.  I can't see a trivial way to change things without refactoring the
whole code.

> ---
>  migration/block.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/migration/block.c b/migration/block.c
> index 41b95d1..ce939e2 100644
> --- a/migration/block.c
> +++ b/migration/block.c
> @@ -37,6 +37,7 @@
>  #define MAX_IS_ALLOCATED_SEARCH (65536 * BDRV_SECTOR_SIZE)
>  
>  #define MAX_IO_BUFFERS 512
> +#define MAX_PARALLEL_IO 16
>  
>  //#define DEBUG_BLK_MIGRATION
>  
> @@ -775,6 +776,7 @@ static int block_save_iterate(QEMUFile *f, void *opaque)
>      while ((block_mig_state.submitted +
>              block_mig_state.read_done) * BLOCK_SIZE <
>             qemu_file_get_rate_limit(f) &&
> +           block_mig_state.submitted < MAX_PARALLEL_IO &&
>             (block_mig_state.submitted + block_mig_state.read_done) <
>             MAX_IO_BUFFERS) {
>          blk_mig_unlock();

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 5/5] migration/block: compare only read blocks against the rate limiter
  2018-03-08 11:18 ` [Qemu-devel] [PATCH 5/5] migration/block: compare only read blocks against the rate limiter Peter Lieven
@ 2018-03-20 18:02   ` Juan Quintela
  0 siblings, 0 replies; 17+ messages in thread
From: Juan Quintela @ 2018-03-20 18:02 UTC (permalink / raw)
  To: Peter Lieven; +Cc: qemu-devel, qemu-block, dgilbert, famz, stefanha, jjherne

Peter Lieven <pl@kamp.de> wrote:
> only read_done blocks are in the queued to be flushed to the migration
> stream. submitted blocks are still in flight.
>
> Signed-off-by: Peter Lieven <pl@kamp.de>

Reviewed-by: Juan Quintela <quintela@redhat.com>


> ---
>  migration/block.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/migration/block.c b/migration/block.c
> index ce939e2..4e950c2 100644
> --- a/migration/block.c
> +++ b/migration/block.c
> @@ -773,8 +773,7 @@ static int block_save_iterate(QEMUFile *f, void *opaque)
>  
>      /* control the rate of transfer */
>      blk_mig_lock();
> -    while ((block_mig_state.submitted +
> -            block_mig_state.read_done) * BLOCK_SIZE <
> +    while (block_mig_state.read_done * BLOCK_SIZE <
>             qemu_file_get_rate_limit(f) &&
>             block_mig_state.submitted < MAX_PARALLEL_IO &&
>             (block_mig_state.submitted + block_mig_state.read_done) <

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH 4/5] migration/block: limit the number of parallel I/O requests
  2018-03-08 11:18 ` [Qemu-devel] [PATCH 4/5] migration/block: limit the number of parallel I/O requests Peter Lieven
  2018-03-08 12:50   ` Juan Quintela
  2018-03-20 18:02   ` Juan Quintela
@ 2018-03-23 16:45   ` Dr. David Alan Gilbert
  2 siblings, 0 replies; 17+ messages in thread
From: Dr. David Alan Gilbert @ 2018-03-23 16:45 UTC (permalink / raw)
  To: Peter Lieven; +Cc: qemu-devel, qemu-block, quintela, famz, stefanha, jjherne

* Peter Lieven (pl@kamp.de) wrote:
> the current implementation submits up to 512 I/O requests in parallel
> which is much to high especially for a background task.
> This patch adds a maximum limit of 16 I/O requests that can
> be submitted in parallel to avoid monopolizing the I/O device.
> 
> Signed-off-by: Peter Lieven <pl@kamp.de>

Queued (including 5/5)
> ---
>  migration/block.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/migration/block.c b/migration/block.c
> index 41b95d1..ce939e2 100644
> --- a/migration/block.c
> +++ b/migration/block.c
> @@ -37,6 +37,7 @@
>  #define MAX_IS_ALLOCATED_SEARCH (65536 * BDRV_SECTOR_SIZE)
>  
>  #define MAX_IO_BUFFERS 512
> +#define MAX_PARALLEL_IO 16
>  
>  //#define DEBUG_BLK_MIGRATION
>  
> @@ -775,6 +776,7 @@ static int block_save_iterate(QEMUFile *f, void *opaque)
>      while ((block_mig_state.submitted +
>              block_mig_state.read_done) * BLOCK_SIZE <
>             qemu_file_get_rate_limit(f) &&
> +           block_mig_state.submitted < MAX_PARALLEL_IO &&
>             (block_mig_state.submitted + block_mig_state.read_done) <
>             MAX_IO_BUFFERS) {
>          blk_mig_unlock();
> -- 
> 2.7.4
> 
> 
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-03-23 16:45 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-08 11:18 [Qemu-devel] [PATCH 0/5] block migration fixes Peter Lieven
2018-03-08 11:18 ` [Qemu-devel] [PATCH 1/5] migration: do not transfer ram during bulk storage migration Peter Lieven
2018-03-08 11:27   ` Juan Quintela
2018-03-08 11:18 ` [Qemu-devel] [PATCH 2/5] migration/block: reset dirty bitmap before read in bulk phase Peter Lieven
2018-03-08 11:39   ` Juan Quintela
2018-03-08 11:18 ` [Qemu-devel] [PATCH 3/5] migration/block: rename MAX_INFLIGHT_IO to MAX_IO_BUFFERS Peter Lieven
2018-03-08 11:31   ` Juan Quintela
2018-03-09 14:58   ` Dr. David Alan Gilbert
2018-03-09 19:57     ` Peter Lieven
2018-03-08 11:18 ` [Qemu-devel] [PATCH 4/5] migration/block: limit the number of parallel I/O requests Peter Lieven
2018-03-08 12:50   ` Juan Quintela
2018-03-08 13:30     ` Peter Lieven
2018-03-20 13:15       ` Peter Lieven
2018-03-20 18:02   ` Juan Quintela
2018-03-23 16:45   ` Dr. David Alan Gilbert
2018-03-08 11:18 ` [Qemu-devel] [PATCH 5/5] migration/block: compare only read blocks against the rate limiter Peter Lieven
2018-03-20 18:02   ` Juan Quintela

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.