All of lore.kernel.org
 help / color / mirror / Atom feed
From: Peter Lieven <pl@kamp.de>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Fam Zheng <famz@redhat.com>, Wen Congyang <wency@cn.fujitsu.com>,
	qemu block <qemu-block@nongnu.org>,
	Juan Quintela <quintela@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [Qemu-block] block migration and MAX_IN_FLIGHT_IO
Date: Wed, 7 Mar 2018 21:35:42 +0100	[thread overview]
Message-ID: <14f9de5c-5e99-1047-ace0-6ab0c7850945@kamp.de> (raw)
In-Reply-To: <CAJSP0QXTFz55WmF8M7Lc8hLjeD3YpSXCQx+Q8y9=9CN8NW8EJw@mail.gmail.com>

Am 07.03.2018 um 10:47 schrieb Stefan Hajnoczi:
> On Wed, Mar 7, 2018 at 7:55 AM, Peter Lieven <pl@kamp.de> wrote:
>> Am 06.03.2018 um 17:35 schrieb Peter Lieven:
>>> Am 06.03.2018 um 17:07 schrieb Stefan Hajnoczi:
>>>> On Mon, Mar 05, 2018 at 02:52:16PM +0000, Dr. David Alan Gilbert wrote:
>>>>> * Peter Lieven (pl@kamp.de) wrote:
>>>>>> Am 05.03.2018 um 12:45 schrieb Stefan Hajnoczi:
>>>>>>> On Thu, Feb 22, 2018 at 12:13:50PM +0100, Peter Lieven wrote:
>>>>>>>> I stumbled across the MAX_INFLIGHT_IO field that was introduced in 2015 and was curious what was the reason
>>>>>>>> to choose 512MB as readahead? The question is that I found that the source VM gets very unresponsive I/O wise
>>>>>>>> while the initial 512MB are read and furthermore seems to stay unreasponsive if we choose a high migration speed
>>>>>>>> and have a fast storage on the destination VM.
>>>>>>>>
>>>>>>>> In our environment I modified this value to 16MB which seems to work much smoother. I wonder if we should make
>>>>>>>> this a user configurable value or define a different rate limit for the block transfer in bulk stage at least?
>>>>>>> I don't know if benchmarks were run when choosing the value.  From the
>>>>>>> commit description it sounds like the main purpose was to limit the
>>>>>>> amount of memory that can be consumed.
>>>>>>>
>>>>>>> 16 MB also fulfills that criteria :), but why is the source VM more
>>>>>>> responsive with a lower value?
>>>>>>>
>>>>>>> Perhaps the issue is queue depth on the storage device - the block
>>>>>>> migration code enqueues up to 512 MB worth of reads, and guest I/O has
>>>>>>> to wait?
>>>>>> That is my guess. Especially if the destination storage is faster we basically alsways have
>>>>>> 512 I/Os in flight on the source storage.
>>>>>>
>>>>>> Does anyone mind if the reduce that value to 16MB or do we need a better mechanism?
>>>>> We've got migration-parameters these days; you could connect it to one
>>>>> of those fairly easily I think.
>>>>> Try: grep -i 'cpu[-_]throttle[-_]initial'  for an example of one that's
>>>>> already there.
>>>>> Then you can set it to whatever you like.
>>>> It would be nice to solve the performance problem without adding a
>>>> tuneable.
>>>>
>>>> On the other hand, QEMU has no idea what the queue depth of the device
>>>> is.  Therefore it cannot prioritize guest I/O over block migration I/O.
>>>>
>>>> 512 parallel requests is much too high.  Most parallel I/O benchmarking
>>>> is done at 32-64 queue depth.
>>>>
>>>> I think that 16 parallel requests is a reasonable maximum number for a
>>>> background job.
>>>>
>>>> We need to be clear though that the purpose of this change is unrelated
>>>> to the original 512 MB memory footprint goal.  It just happens to touch
>>>> the same constant but the goal is now to submit at most 16 I/O requests
>>>> in parallel to avoid monopolizing the I/O device.
>>> I think we should really look at this. The variables that control if we stay in the while loop or not are incremented and decremented
>>> at the following places:
>>>
>>> mig_save_device_dirty:
>>> mig_save_device_bulk:
>>>     block_mig_state.submitted++;
>>>
>>> blk_mig_read_cb:
>>>     block_mig_state.submitted--;
>>>     block_mig_state.read_done++;
>>>
>>> flush_blks:
>>>     block_mig_state.read_done--;
>>>
>>> The condition of the while loop is:
>>> (block_mig_state.submitted +
>>>             block_mig_state.read_done) * BLOCK_SIZE <
>>>            qemu_file_get_rate_limit(f) &&
>>>            (block_mig_state.submitted +
>>>             block_mig_state.read_done) <
>>>            MAX_INFLIGHT_IO)
>>>
>>> At first I wonder if we ever reach the rate-limit because we put the read buffers onto f AFTER we exit the while loop?
>>>
>>> And even if we reach the limit we constantly maintain 512 I/Os in parallel because we immediately decrement read_done
>>> when we put the buffers to f in flush_blks. In the next iteration of the while loop we then read again until we have 512 in-flight I/Os.
>>>
>>> And shouldn't we have a time limit to limit the time we stay in the while loop? I think we artificially delay sending data to f?
>> Thinking about it for a while I would propose the following:
>>
>> a) rename MAX_INFLIGHT_IO to MAX_IO_BUFFERS
>> b) add MAX_PARALLEL_IO with a value of 16
>> c) compare qemu_file_get_rate_limit only with block_mig_state.read_done
>>
>> This would yield in the following condition for the while loop:
>>
>> (block_mig_state.read_done * BLOCK_SIZE < qemu_file_get_rate_limit(f) &&
>>  (block_mig_state.submitted + block_mig_state.read_done) < MAX_IO_BUFFERS &&
>>  block_mig_state.submitted < MAX_PARALLEL_IO)
>>
>> Sounds that like a plan?
> That sounds good to me.

I will prepare patches for this.

Peter

  reply	other threads:[~2018-03-07 20:35 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-22 11:13 [Qemu-devel] block migration and MAX_IN_FLIGHT_IO Peter Lieven
2018-03-05 11:45 ` Stefan Hajnoczi
2018-03-05 14:37   ` Peter Lieven
2018-03-05 14:52     ` Dr. David Alan Gilbert
2018-03-06 16:07       ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2018-03-06 16:35         ` Peter Lieven
2018-03-07  7:55           ` Peter Lieven
2018-03-07  8:06             ` [Qemu-devel] block migration and dirty bitmap reset Peter Lieven
2018-03-08  1:28               ` Fam Zheng
2018-03-08  8:57                 ` Peter Lieven
2018-03-08  9:01                   ` Fam Zheng
2018-03-08 10:33                     ` Peter Lieven
2018-03-07  9:47             ` [Qemu-devel] [Qemu-block] block migration and MAX_IN_FLIGHT_IO Stefan Hajnoczi
2018-03-07 20:35               ` Peter Lieven [this message]
2018-03-06 16:14       ` [Qemu-devel] " Peter Lieven

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=14f9de5c-5e99-1047-ace0-6ab0c7850945@kamp.de \
    --to=pl@kamp.de \
    --cc=dgilbert@redhat.com \
    --cc=famz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@gmail.com \
    --cc=stefanha@redhat.com \
    --cc=wency@cn.fujitsu.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.