Re: [PATCH v1 0/1] COLO: migrate dirty ram pages before colo checkpoint

From: Derek Su <jwsu1986@gmail.com>
To: Lukas Straub <lukasstraub2@web.de>
Cc: zhang.zhanghailiang@huawei.com, chyang@qnap.com,
	quintela@redhat.com, Derek Su <dereksu@qnap.com>,
	dgilbert@redhat.com, qemu-devel@nongnu.org, ctcheng@qnap.com
Subject: Re: [PATCH v1 0/1] COLO: migrate dirty ram pages before colo checkpoint
Date: Thu, 13 Aug 2020 18:27:32 +0800	[thread overview]
Message-ID: <CAFKS8hWT1ieRkZcV4Q1ReC4X2wAXKteWB1VkEB7NesT2T-s_Kg@mail.gmail.com> (raw)
In-Reply-To: <20200731094940.7a26e583@luklap>

On Fri, Jul 31, 2020 at 3:52 PM Lukas Straub <lukasstraub2@web.de> wrote:
>
> On Sun, 21 Jun 2020 10:10:03 +0800
> Derek Su <dereksu@qnap.com> wrote:
>
> > This series is to reduce the guest's downtime during colo checkpoint
> > by migrating dirty ram pages as many as possible before colo checkpoint.
> >
> > If the iteration count reaches COLO_RAM_MIGRATE_ITERATION_MAX or
> > ram pending size is lower than 'x-colo-migrate-ram-threshold',
> > stop the ram migration and do colo checkpoint.
> >
> > Test environment:
> > The both primary VM and secondary VM has 1GiB ram and 10GbE NIC
> > for FT traffic.
> > One fio buffer write job runs on the guest.
> > The result shows the total primary VM downtime is decreased by ~40%.
> >
> > Please help to review it and suggestions are welcomed.
> > Thanks.
>
> Hello Derek,
> Sorry for the late reply.
> I think this is not a good idea, because it unnecessarily introduces a delay between checkpoint request and the checkpoint itself and thus impairs network bound workloads due to increased network latency. Workloads that are independent from network don't cause many checkpoints anyway, so it doesn't help there either.
>

Hello, Lukas & Zhanghailiang

Thanks for your opinions.
I went through my patch, and I feel a little confused and would like
to dig into it more.

In this patch, colo_migrate_ram_before_checkpoint() is before
COLO_MESSAGE_CHECKPOINT_REQUEST,
so the SVM and PVM should not enter the pause state.

In the meanwhile, the packets to PVM/SVM can still be compared and
notify inconsistency if mismatched, right?
Is it possible to introduce extra network latency?

In my test (randwrite to disk by fio with direct=0),
the ping from another client to the PVM  using generic colo and colo
used this patch are below.
The network latency does not increase as my expectation.

generic colo
```
64 bytes from 192.168.80.18: icmp_seq=87 ttl=64 time=28.109 ms
64 bytes from 192.168.80.18: icmp_seq=88 ttl=64 time=16.747 ms
64 bytes from 192.168.80.18: icmp_seq=89 ttl=64 time=2388.779 ms
<----checkpoint start
64 bytes from 192.168.80.18: icmp_seq=90 ttl=64 time=1385.792 ms
64 bytes from 192.168.80.18: icmp_seq=91 ttl=64 time=384.896 ms
<----checkpoint end
64 bytes from 192.168.80.18: icmp_seq=92 ttl=64 time=3.895 ms
64 bytes from 192.168.80.18: icmp_seq=93 ttl=64 time=1.020 ms
64 bytes from 192.168.80.18: icmp_seq=94 ttl=64 time=0.865 ms
64 bytes from 192.168.80.18: icmp_seq=95 ttl=64 time=0.854 ms
64 bytes from 192.168.80.18: icmp_seq=96 ttl=64 time=28.359 ms
64 bytes from 192.168.80.18: icmp_seq=97 ttl=64 time=12.309 ms
64 bytes from 192.168.80.18: icmp_seq=98 ttl=64 time=0.870 ms
64 bytes from 192.168.80.18: icmp_seq=99 ttl=64 time=2371.733 ms
64 bytes from 192.168.80.18: icmp_seq=100 ttl=64 time=1371.440 ms
64 bytes from 192.168.80.18: icmp_seq=101 ttl=64 time=366.414 ms
64 bytes from 192.168.80.18: icmp_seq=102 ttl=64 time=0.818 ms
64 bytes from 192.168.80.18: icmp_seq=103 ttl=64 time=0.997 ms
```

colo used this patch
```
64 bytes from 192.168.80.18: icmp_seq=72 ttl=64 time=1.417 ms
64 bytes from 192.168.80.18: icmp_seq=73 ttl=64 time=0.931 ms
64 bytes from 192.168.80.18: icmp_seq=74 ttl=64 time=0.876 ms
64 bytes from 192.168.80.18: icmp_seq=75 ttl=64 time=1184.034 ms
<----checkpoint start
64 bytes from 192.168.80.18: icmp_seq=76 ttl=64 time=181.297 ms
<----checkpoint end
64 bytes from 192.168.80.18: icmp_seq=77 ttl=64 time=0.865 ms
64 bytes from 192.168.80.18: icmp_seq=78 ttl=64 time=0.858 ms
64 bytes from 192.168.80.18: icmp_seq=79 ttl=64 time=1.247 ms
64 bytes from 192.168.80.18: icmp_seq=80 ttl=64 time=0.946 ms
64 bytes from 192.168.80.18: icmp_seq=81 ttl=64 time=0.855 ms
64 bytes from 192.168.80.18: icmp_seq=82 ttl=64 time=0.868 ms
64 bytes from 192.168.80.18: icmp_seq=83 ttl=64 time=0.749 ms
64 bytes from 192.168.80.18: icmp_seq=84 ttl=64 time=2.154 ms
64 bytes from 192.168.80.18: icmp_seq=85 ttl=64 time=1499.186 ms
64 bytes from 192.168.80.18: icmp_seq=86 ttl=64 time=496.173 ms
64 bytes from 192.168.80.18: icmp_seq=87 ttl=64 time=0.854 ms
64 bytes from 192.168.80.18: icmp_seq=88 ttl=64 time=0.774 ms
```

Thank you.

Regards,
Derek

> Hailang did have a patch to migrate ram between checkpoints, which should help all workloads, but it wasn't merged back then. I think you can pick it up again, rebase and address David's and Eric's comments:
> https://lore.kernel.org/qemu-devel/20200217012049.22988-3-zhang.zhanghailiang@huawei.com/T/#u
>
> Hailang, are you ok with that?
>
> Regards,
> Lukas Straub