All of lore.kernel.org
 help / color / mirror / Atom feed
From: Andrey Gruzdev <andrey.gruzdev@virtuozzo.com>
To: Peter Xu <peterx@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>,
	Markus Armbruster <armbru@redhat.com>,
	qemu-devel@nongnu.org,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>, Den Lunev <den@openvz.org>
Subject: Re: [PATCH v6 0/4] migration: UFFD write-tracking migration/snapshots
Date: Tue, 15 Dec 2020 22:53:13 +0300	[thread overview]
Message-ID: <2a1f164c-94ab-0d35-96c0-792524d9ef30@virtuozzo.com> (raw)
In-Reply-To: <20201211150940.GC6520@xz-x1>

On 11.12.2020 18:09, Peter Xu wrote:
> On Fri, Dec 11, 2020 at 04:13:02PM +0300, Andrey Gruzdev wrote:
>> I've also made wr-fault resolution latency measurements, for the case when migration
>> stream is dumped to a file in cached mode.. Should approximately match saving to the
>> file fd directly though I used 'migrate exec:<>' using a hand-written tool.
>>
>> VM config is 6 vCPUs + 16GB RAM, qcow2 image on Seagate 7200.11 series 1.5TB HDD,
>> snapshot goes to the same disk. Guest is Windows 10.
>>
>> The test scenario is playing full-HD youtube video in Firefox while saving snapshot.
>>
>> Latency measurement begin/end points are fs/userfaultfd.c:handle_userfault() and
>> mm/userfaultfd.c:mwriteprotect_range(), respectively. For any faulting page, the
>> oldest wr-fault timestamp is accounted.
>>
>> The whole time to take snapshot was ~30secs, file size is around 3GB.
>> So far seems to be not a very bad picture.. However 16-255msecs range is worrying
>> me a bit, seems it causes audio backend buffer underflows sometimes.
>>
>>
>>       msecs               : count     distribution
>>           0 -> 1          : 111755   |****************************************|
>>           2 -> 3          : 52       |                                        |
>>           4 -> 7          : 105      |                                        |
>>           8 -> 15         : 428      |                                        |
>>          16 -> 31         : 335      |                                        |
>>          32 -> 63         : 4        |                                        |
>>          64 -> 127        : 8        |                                        |
>>         128 -> 255        : 5        |                                        |
> Great test!  Thanks for sharing these information.
>
> Yes it's good enough for a 1st version, so it's already better than
> functionally work. :)
>
> So did you try your last previous patch to see whether it could improve in some
> way?  Again we can gradually optimize upon your current work.
>
> Btw, you reminded me that why not we track all these from kernel? :) That's a
> good idea.  So, how did you trace it yourself?  Something like below should
> work with bpftrace, but I feel like you were done in some other way, so just
> fyi:
>
>          # cat latency.bpf
>          kprobe:handle_userfault
>          {
>                  @start[tid] = nsecs;
>          }
>
>          kretprobe:handle_userfault
>          {
>                  if (@start[tid]) {
>                          $delay = nsecs - @start[tid];
>                          delete(@start[tid]);
>                          @delay_us = hist($delay / 1000);
>                  }
>          }
>          # bpftrace latency.bpf
>
> Tracing return of handle_userfault() could be more accurate in that it also
> takes the latency between UFFDIO_WRITEPROTECT until vcpu got waked up again.
> However it's inaccurate because after a recent change to this code path in
> commit f9bf352224d7 ("userfaultfd: simplify fault handling", 2020-08-03)
> handle_userfault() could return even before page fault resolved.  However it
> should be good enough in most cases because even if it happens, it'll fault
> into handle_userfault() again, then we just got one more count.
>
> Thanks!
>
Peter, thanks for idea, now I've also tried with kretprobe, for Windows 10
and Ubuntu 20.04 guests, two runs for each. Windows is ugly here(

First are series of runs without scan-rate-limiting.patch.
Windows 10:

      msecs               : count     distribution
          0 -> 1          : 131913   |****************************************|
          2 -> 3          : 106      |                                        |
          4 -> 7          : 362      |                                        |
          8 -> 15         : 619      |                                        |
         16 -> 31         : 28       |                                        |
         32 -> 63         : 1        |                                        |
         64 -> 127        : 2        |                                        |


      msecs               : count     distribution
          0 -> 1          : 199273   |****************************************|
          2 -> 3          : 190      |                                        |
          4 -> 7          : 425      |                                        |
          8 -> 15         : 927      |                                        |
         16 -> 31         : 69       |                                        |
         32 -> 63         : 3        |                                        |
         64 -> 127        : 16       |                                        |
        128 -> 255        : 2        |                                        |

Ubuntu 20.04:

      msecs               : count     distribution
          0 -> 1          : 104954   |****************************************|
          2 -> 3          : 9        |                                        |

      msecs               : count     distribution
          0 -> 1          : 147159   |****************************************|
          2 -> 3          : 13       |                                        |
          4 -> 7          : 0        |                                        |
          8 -> 15         : 0        |                                        |
         16 -> 31         : 0        |                                        |
         32 -> 63         : 0        |                                        |
         64 -> 127        : 1        |                                        |


Here are runs with scan-rate-limiting.patch.
Windows 10:

      msecs               : count     distribution
          0 -> 1          : 234492   |****************************************|
          2 -> 3          : 66       |                                        |
          4 -> 7          : 219      |                                        |
          8 -> 15         : 109      |                                        |
         16 -> 31         : 0        |                                        |
         32 -> 63         : 0        |                                        |
         64 -> 127        : 1        |                                        |

      msecs               : count     distribution
          0 -> 1          : 183171   |****************************************|
          2 -> 3          : 109      |                                        |
          4 -> 7          : 281      |                                        |
          8 -> 15         : 444      |                                        |
         16 -> 31         : 3        |                                        |
         32 -> 63         : 1        |                                        |

Ubuntu 20.04:

      msecs               : count     distribution
          0 -> 1          : 92224    |****************************************|
          2 -> 3          : 9        |                                        |
          4 -> 7          : 0        |                                        |
          8 -> 15         : 0        |                                        |
         16 -> 31         : 1        |                                        |
         32 -> 63         : 0        |                                        |
         64 -> 127        : 1        |                                        |

      msecs               : count     distribution
          0 -> 1          : 97021    |****************************************|
          2 -> 3          : 7        |                                        |
          4 -> 7          : 0        |                                        |
          8 -> 15         : 0        |                                        |
         16 -> 31         : 0        |                                        |
         32 -> 63         : 0        |                                        |
         64 -> 127        : 0        |                                        |
        128 -> 255        : 1        |                                        |

So, initial variant of rate-limiting makes some positive effect, but not very
noticible. Interesting is the case of Windows guest, why the difference is so large,
compared to Linux. The reason (theoretically) might be some of virtio or QXL drivers,
hard to say. At least Windows VM has been configured with a set of Hyper-V
enlightments, there's nothing to improve in domain config.

For Linux guests latencies are good enough without any additional efforts.

Also, I've missed some code to deal with snapshotting of suspended guest, so I'll made
v7 series with the fix and also try to add more effective solution to reduce millisecond-grade
latencies.

And yes, I've used bpftrace-like tool - BCC from iovisor with python frontend. Seems a bit more
friendly then bpftrace.

-- 
Andrey Gruzdev, Principal Engineer
Virtuozzo GmbH  +7-903-247-6397
                 virtuzzo.com



  parent reply	other threads:[~2020-12-15 19:55 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-09 10:08 [PATCH v6 0/4] migration: UFFD write-tracking migration/snapshots Andrey Gruzdev via
2020-12-09 10:08 ` [PATCH v6 1/4] migration: introduce 'background-snapshot' migration capability Andrey Gruzdev via
2020-12-10 12:52   ` Markus Armbruster
2020-12-09 10:08 ` [PATCH v6 2/4] migration: introduce UFFD-WP low-level interface helpers Andrey Gruzdev via
2020-12-09 10:08 ` [PATCH v6 3/4] migration: support UFFD write fault processing in ram_save_iterate() Andrey Gruzdev via
2020-12-09 10:08 ` [PATCH v6 4/4] migration: implementation of background snapshot thread Andrey Gruzdev via
2020-12-11 13:13 ` [PATCH v6 0/4] migration: UFFD write-tracking migration/snapshots Andrey Gruzdev
2020-12-11 15:09   ` Peter Xu
2020-12-15 19:52     ` Andrey Gruzdev
2020-12-15 19:53     ` Andrey Gruzdev [this message]
2020-12-16 21:02       ` Peter Xu
2020-12-17  7:50         ` Andrey Gruzdev

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2a1f164c-94ab-0d35-96c0-792524d9ef30@virtuozzo.com \
    --to=andrey.gruzdev@virtuozzo.com \
    --cc=armbru@redhat.com \
    --cc=den@openvz.org \
    --cc=dgilbert@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.