From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:60640) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YksgC-0001dB-Uo for qemu-devel@nongnu.org; Wed, 22 Apr 2015 07:18:58 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Yksg9-0002KF-L2 for qemu-devel@nongnu.org; Wed, 22 Apr 2015 07:18:56 -0400 Received: from mx1.redhat.com ([209.132.183.28]:54582) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Yksg9-0002K7-89 for qemu-devel@nongnu.org; Wed, 22 Apr 2015 07:18:53 -0400 Date: Wed, 22 Apr 2015 12:18:34 +0100 From: "Dr. David Alan Gilbert" Message-ID: <20150422111833.GD2386@work-vm> References: <1427347774-8960-1-git-send-email-zhang.zhanghailiang@huawei.com> <5524E3E9.1070102@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <5524E3E9.1070102@huawei.com> Subject: Re: [Qemu-devel] [RFC PATCH v4 00/28] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: zhanghailiang Cc: lizhijian@cn.fujitsu.com, quintela@redhat.com, yunhong.jiang@intel.com, eddie.dong@intel.com, qemu-devel@nongnu.org, peter huangpeng , arei.gonglei@huawei.com, amit.shah@redhat.com, david@gibson.dropbear.id.au * zhanghailiang (zhang.zhanghailiang@huawei.com) wrote: > Hi, > > ping ... I will get to look at this again; but not until after next week. > The main blocked bugs for COLO have been solved, I've got the v3 set running, but the biggest problem I hit are problems with the packet comparison module; I've seen a panic which I think is in colo_send_checkpoint_req that I think is due to the use of GFP_KERNEL to allocate the netlink message and I think it can schedule there. I tried making that a GFP_ATOMIC but I'm hitting other problems with : kcolo_thread, no conn, schedule out that I've not had time to look into yet. So I only get about a 50% success rate of starting COLO. I see there are stuff in the TODO of the colo-proxy that seem to say the netlink stuff should change, maybe you're already fixing that? > we also have finished some new features and optimization on COLO. (If you are interested in this, > we can send them to you in private ;)) > For easy of review, it is better to keep it simple now, so we will not add too much new codes into this frame > patch set before it been totally reviewed. I'd like to see those; but I don't want to take code privately. It's OK to post extra stuff as a separate set. > COLO is a totally new feature which is still in early stage, we hope to speed up the development, > so your comments and feedback are warmly welcomed. :) Yes, it's getting there though; I don't think anyone else has got this close to getting a full FT set working with disk and networking. Dave > > Thanks, > zhanghailiang > > On 2015/3/26 13:29, zhanghailiang wrote: > >This is the 4th version of COLO, here is only COLO frame part, include: VM checkpoint, > >failover, proxy API, block replication API, not include block replication. > >The block part has been sent by wencongyang: > >[RFC PATCH COLO v2 00/13] Block replication for continuous checkpoints > > > >Compared with last version, there aren't too much optimize and new functions. > >The main reason is that there is an known issue that still unsolved, we found > >some dirty pages which have been missed setting bit in corresponding bitmap. > >And it will trigger strange problem in VM. > >We hope to resolve it before add more codes. > > > >You can get the newest integrated qemu colo patches from github: > >https://github.com/coloft/qemu/commits/colo-v1.1 > > > >About how to test COLO, Please reference to the follow link. > >http://wiki.qemu.org/Features/COLO. > > > >Please review and test. > > > >Known issue still unsolved: > >(1) Some pages dirtied without setting its corresponding dirty-bitmap. > > > >Previous posted RFC patch series: > >http://lists.nongnu.org/archive/html/qemu-devel/2014-06/msg05567.html > >http://lists.nongnu.org/archive/html/qemu-devel/2014-09/msg04459.html > >https://lists.gnu.org/archive/html/qemu-devel/2015-02/msg04771.html > > > >TODO list: > >1 Optimize the process of checkpoint, shorten the time-consuming: > > (Partly done, patch is not include into this series) > > 1) separate ram and device save/load process to reduce size of extra memory > > used during checkpoint > > 2) live migrate part of dirty pages to slave during sleep time. > >2 Add more debug/stat info > > (Partly done, patch is not include into this series) > > include checkpoint count, proxy discompare count, downtime, > > number of live migrated pages, total sent pages, etc. > >3 Strengthen failover > >4 optimize proxy part, include proxy script. > >5 The capability of continuous FT > > > >v4: > >- New block replication scheme (use image-fleecing for sencondary side) > >- Adress some comments from Eric Blake and Dave > >- Add commmand colo-set-checkpoint-period to set the time of periodic checkpoint > >- Add a delay (100ms) between continuous checkpoint requests to ensure VM > > run 100ms at least since last pause. > > > >v3: > >- use proxy instead of colo agent to compare network packets > >- add block replication > >- Optimize failover disposal > >- handle shutdown > > > >v2: > >- use QEMUSizedBuffer/QEMUFile as COLO buffer > >- colo support is enabled by default > >- add nic replication support > >- addressed comments from Eric Blake and Dr. David Alan Gilbert > > > >v1: > >- implement the frame of colo > > > >Wen Congyang (1): > > COLO: Add block replication into colo process > > > >zhanghailiang (27): > > configure: Add parameter for configure to enable/disable COLO support > > migration: Introduce capability 'colo' to migration > > COLO: migrate colo related info to slave > > migration: Integrate COLO checkpoint process into migration > > migration: Integrate COLO checkpoint process into loadvm > > COLO: Implement colo checkpoint protocol > > COLO: Add a new RunState RUN_STATE_COLO > > QEMUSizedBuffer: Introduce two help functions for qsb > > COLO: Save VM state to slave when do checkpoint > > COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily > > COLO VMstate: Load VM state into qsb before restore it > > arch_init: Start to trace dirty pages of SVM > > COLO RAM: Flush cached RAM into SVM's memory > > COLO failover: Introduce a new command to trigger a failover > > COLO failover: Implement COLO master/slave failover work > > COLO failover: Don't do failover during loading VM's state > > COLO: Add new command parameter 'colo_nicname' 'colo_script' for net > > COLO NIC: Init/remove colo nic devices when add/cleanup tap devices > > COLO NIC: Implement colo nic device interface configure() > > COLO NIC : Implement colo nic init/destroy function > > COLO NIC: Some init work related with proxy module > > COLO: Do checkpoint according to the result of net packets comparing > > COLO: Improve checkpoint efficiency by do additional periodic > > checkpoint > > COLO: Add colo-set-checkpoint-period command > > COLO NIC: Implement NIC checkpoint and failover > > COLO: Disable qdev hotplug when VM is in COLO mode > > COLO: Implement shutdown checkpoint > > > > arch_init.c | 199 +++++++- > > configure | 14 + > > hmp-commands.hx | 30 ++ > > hmp.c | 14 + > > hmp.h | 2 + > > include/exec/cpu-all.h | 1 + > > include/migration/migration-colo.h | 58 +++ > > include/migration/migration-failover.h | 22 + > > include/migration/migration.h | 3 + > > include/migration/qemu-file.h | 3 +- > > include/net/colo-nic.h | 25 + > > include/net/net.h | 4 + > > include/sysemu/sysemu.h | 3 + > > migration/Makefile.objs | 2 + > > migration/colo-comm.c | 80 ++++ > > migration/colo-failover.c | 48 ++ > > migration/colo.c | 809 +++++++++++++++++++++++++++++++++ > > migration/migration.c | 60 ++- > > migration/qemu-file-buf.c | 58 +++ > > net/Makefile.objs | 1 + > > net/colo-nic.c | 438 ++++++++++++++++++ > > net/tap.c | 45 +- > > qapi-schema.json | 42 +- > > qemu-options.hx | 10 +- > > qmp-commands.hx | 41 ++ > > savevm.c | 2 +- > > scripts/colo-proxy-script.sh | 97 ++++ > > stubs/Makefile.objs | 1 + > > stubs/migration-colo.c | 58 +++ > > vl.c | 36 +- > > 30 files changed, 2178 insertions(+), 28 deletions(-) > > create mode 100644 include/migration/migration-colo.h > > create mode 100644 include/migration/migration-failover.h > > create mode 100644 include/net/colo-nic.h > > create mode 100644 migration/colo-comm.c > > create mode 100644 migration/colo-failover.c > > create mode 100644 migration/colo.c > > create mode 100644 migration/colo.c. > > create mode 100644 net/colo-nic.c > > create mode 100755 scripts/colo-proxy-script.sh > > create mode 100644 stubs/migration-colo.c > > > > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK