All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, Den Lunev <den@openvz.org>,
	Eric Blake <eblake@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Juan Quintela <quintela@redhat.com>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	Markus Armbruster <armbru@redhat.com>
Subject: Re: [PATCH v3 3/7] support UFFD write fault processing in ram_save_iterate()
Date: Tue, 24 Nov 2020 20:40:00 +0300	[thread overview]
Message-ID: <71c30a1e-8fbd-1eb3-aa5f-2efc973c0647@virtuozzo.com> (raw)
In-Reply-To: <20201124151729.GA257984@xz-x1>

On 24.11.2020 18:17, Peter Xu wrote:
> On Tue, Nov 24, 2020 at 11:02:09AM +0300, Andrey Gruzdev wrote:
>> On 24.11.2020 00:34, Peter Xu wrote:
>>> On Fri, Nov 20, 2020 at 07:53:34PM +0300, Andrey Gruzdev wrote:
>>>> On 20.11.2020 19:43, Peter Xu wrote:
>>>>> On Fri, Nov 20, 2020 at 07:15:07PM +0300, Andrey Gruzdev wrote:
>>>>>> Yeah, I think we can re-use the postcopy queue code for faulting pages. I'm
>>>>>> worring a little about some additional overhead dealing with urgent request
>>>>>> semaphore. Also, the code won't change a lot, something like:
>>>>>>
>>>>>> [...]
>>>>>>            /* In case of 'write-tracking' migration we first try
>>>>>>             * to poll UFFD and sse if we have write page fault event */
>>>>>>            poll_fault_page(rs);
>>>>>>
>>>>>>            again = true;
>>>>>>            found = get_queued_page(rs, &pss);
>>>>>>
>>>>>>            if (!found) {
>>>>>>                /* priority queue empty, so just search for something dirty */
>>>>>>                found = find_dirty_block(rs, &pss, &again);
>>>>>>            }
>>>>>> [...]
>>>>>
>>>>> Could I ask what's the "urgent request semaphore"?  Thanks,
>>>>>
>>>>
>>>> These function use it (the correct name is 'rate_limit_sem'):
>>>>
>>>> void migration_make_urgent_request(void)
>>>> {
>>>>       qemu_sem_post(&migrate_get_current()->rate_limit_sem);
>>>> }
>>>>
>>>> void migration_consume_urgent_request(void)
>>>> {
>>>>       qemu_sem_wait(&migrate_get_current()->rate_limit_sem);
>>>> }
>>>>
>>>> They are called from ram_save_queue_pages and unqueue_page, accordingly, to
>>>> control migration rate limiter.
>>>>
>>>> bool migration_rate_limit(void)
>>>> {
>>>> [...]
>>>>           /*
>>>>            * Wait for a delay to do rate limiting OR
>>>>            * something urgent to post the semaphore.
>>>>            */
>>>>           int ms = s->iteration_start_time + BUFFER_DELAY - now;
>>>>           trace_migration_rate_limit_pre(ms);
>>>>           if (qemu_sem_timedwait(&s->rate_limit_sem, ms) == 0) {
>>>>               /*
>>>>                * We were woken by one or more urgent things but
>>>>                * the timedwait will have consumed one of them.
>>>>                * The service routine for the urgent wake will dec
>>>>                * the semaphore itself for each item it consumes,
>>>>                * so add this one we just eat back.
>>>>                */
>>>>               qemu_sem_post(&s->rate_limit_sem);
>>>>               urgent = true;
>>>>           }
>>>> [...]
>>>> }
>>>>
>>>
>>> Hmm... Why its overhead could be a problem?  If it's an overhead that can be
>>> avoided, then postcopy might want that too.
>>>
>>> The thing is I really feel like the snapshot logic can leverage the whole idea
>>> of existing postcopy (like get_queued_page/unqueue_page; it's probably due to
>>> the fact that both of them want to "migrate some more urgent pages than the
>>> background migration, due to either missing/wrprotected pages"), but I might
>>> have something missing.
>>>
>>> Thanks,
>>>
>>
>> I don't think this overhead itself is a problem since its negligible
>> compared to other stuff.. You're undoubtly correct about using postcopy idea
>> in case when wr-fault pages arrive from the separate thread or external
>> source. Then we really need to decouple that separate thread or external
>> requestor from the migration thread.
>>
>> In this patch series wr-faults arrive in the same migration loop with normal
>> scan, so I don't see direct reason to pass it through the queue, but I might
>> have missed something from your proposal.
> 
> I see your point.  For me whether using a thread is not extremely important -
> actually we can have a standalone thread to handle the vcpu faults too just
> like postcopy; it's just run on the src host.  My major point is that we should
> avoid introducing the extra pss logic because fundamentally it's doing the same
> thing as get_queued_page() right now, unless I'm wrong...
> 
> So I think your previous poll_fault_page() call is okay; assuming the same poll
> model as you used, I mean something like this:
> 
> ------8<-------
> diff --git a/migration/ram.c b/migration/ram.c
> index 7811cde643..718dd276c7 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -1473,6 +1473,14 @@ static bool get_queued_page(RAMState *rs, PageSearchStatus *pss)
>   
>       } while (block && !dirty);
>   
> +    if (!block) {
> +        /*
> +         * Set the block too if live snapshot is enabled; that's when we have
> +         * vcpus got blocked by the wr-protected pages.
> +         */
> +        block = poll_fault_page(rs, &offset);
> +    }
> +
>       if (block) {
>           /*
>            * As soon as we start servicing pages out of order, then we have
> ------8<-------
> 
> So as long as we set the block and offset, pss will be updated just like
> postcopy.  That's exactly what we want, am I right?
> 

Also think the best place to put polling call here, not to touch pss yet 
again.

>>
>> Do you mean that if we use postcopy logic, we'll allocate memory and make
>> copies of faulting pages and then immediately un-protect them?
>> Or we just put faulting page address to the queued item and release
>> protection after page content has been saved.
> 
> I think current approach would be fine, so we don't copy page and unprotect at
> a single place after the page is dumped to the precopy stream.  If you could
> measure the latencies then it'll be even better, then we'll know what to expect.
> 
> IIRC the "copy page first" idea came from Denis's initial version, in which I
> thought as too aggresive since potentially it can eat twice the memory on the
> host for the single guest, not to mention when live snapshot is taken for
> mutliple guests on the same host.  It's just not something that can be directly
> expected when the user wants to take a snapshot.  That's something we can do
> upon this work, imho, if we'll have very high latency on resolving the page
> faults. But that's still the last thing to try (or at least an "by default off"
> option) so we can still think about some other solution before trying to eat
> too much host mem.
> 
> Thanks,
> 

I totally agree that creating shadow copies for each faulting page is 
not a very robust idea.. Cause we could never know the 'proper' limit 
for the buffer memory we should allocate and cut from the host, it 
depends on hardware, guest workload, host I/O intensity, too many things.

So I believe file/block backend side is a better place to provide some 
buffering and keep memory/latency tradeoff. Also it allows to be less 
'invasive' to QEMU code itself.

Thanks,

-- 
Andrey Gruzdev, Principal Engineer
Virtuozzo GmbH  +7-903-247-6397
                 virtuzzo.com


  reply	other threads:[~2020-11-24 17:42 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-19 12:59 [PATCH v3 0/7] UFFD write-tracking migration/snapshots Andrey Gruzdev via
2020-11-19 12:59 ` [PATCH v3 1/7] introduce 'track-writes-ram' migration capability Andrey Gruzdev via
2020-11-19 18:51   ` Peter Xu
2020-11-19 19:07     ` Peter Xu
2020-11-20 11:35       ` Andrey Gruzdev
2020-11-24 16:55       ` Dr. David Alan Gilbert
2020-11-24 17:25         ` Andrey Gruzdev
2020-11-20 11:32     ` Andrey Gruzdev
2020-11-19 12:59 ` [PATCH v3 2/7] introduce UFFD-WP low-level interface helpers Andrey Gruzdev via
2020-11-19 18:39   ` Peter Xu
2020-11-20 11:04     ` Andrey Gruzdev
2020-11-20 15:01       ` Peter Xu
2020-11-20 15:43         ` Andrey Gruzdev
2020-11-24 17:57   ` Dr. David Alan Gilbert
2020-11-25  8:11     ` Andrey Gruzdev
2020-11-25 18:43       ` Dr. David Alan Gilbert
2020-11-25 19:17         ` Andrey Gruzdev
2020-11-19 12:59 ` [PATCH v3 3/7] support UFFD write fault processing in ram_save_iterate() Andrey Gruzdev via
2020-11-19 18:25   ` Peter Xu
2020-11-20 10:44     ` Andrey Gruzdev
2020-11-20 15:07       ` Peter Xu
2020-11-20 16:15         ` Andrey Gruzdev
2020-11-20 16:43           ` Peter Xu
2020-11-20 16:53             ` Andrey Gruzdev
2020-11-23 21:34               ` Peter Xu
2020-11-24  8:02                 ` Andrey Gruzdev
2020-11-24 15:17                   ` Peter Xu
2020-11-24 17:40                     ` Andrey Gruzdev [this message]
2020-11-25 13:08   ` Dr. David Alan Gilbert
2020-11-25 14:40     ` Andrey Gruzdev
2020-11-25 18:41       ` Dr. David Alan Gilbert
2020-11-25 19:12         ` Andrey Gruzdev
2020-11-19 12:59 ` [PATCH v3 4/7] implementation of write-tracking migration thread Andrey Gruzdev via
2020-11-19 18:47   ` Peter Xu
2020-11-20 11:41     ` Andrey Gruzdev
2020-11-19 12:59 ` [PATCH v3 5/7] implementation of vm_start() BH Andrey Gruzdev via
2020-11-19 18:46   ` Peter Xu
2020-11-20 11:13     ` Andrey Gruzdev
2020-11-19 12:59 ` [PATCH v3 6/7] the rest of write tracking migration code Andrey Gruzdev via
2020-11-19 12:59 ` [PATCH v3 7/7] introduce simple linear scan rate limiting mechanism Andrey Gruzdev via
2020-11-19 20:02   ` Peter Xu
2020-11-20 12:06     ` Andrey Gruzdev
2020-11-20 15:23       ` Peter Xu
2020-11-24 16:41 ` [PATCH v3 0/7] UFFD write-tracking migration/snapshots Dr. David Alan Gilbert

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=71c30a1e-8fbd-1eb3-aa5f-2efc973c0647@virtuozzo.com \
    --to=andrey.gruzdev@virtuozzo.com \
    --cc=armbru@redhat.com \
    --cc=den@openvz.org \
    --cc=dgilbert@redhat.com \
    --cc=eblake@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.