* About migration/colo issue
@ 2020-05-15 3:16 Zhang, Chen
2020-05-15 3:28 ` Zhanghailiang
2020-05-15 7:12 ` Lukas Straub
0 siblings, 2 replies; 5+ messages in thread
From: Zhang, Chen @ 2020-05-15 3:16 UTC (permalink / raw)
To: zhanghailiang, Dr . David Alan Gilbert, qemu-devel, Li Zhijian
Cc: Jason Wang, Lukas Straub
[-- Attachment #1: Type: text/plain, Size: 4965 bytes --]
Hi Hailiang/Dave.
I found a urgent problem in current upstream code, COLO will stuck on secondary checkpoint and later.
The guest will stuck by this issue.
I have bisect upstream code, this issue caused by Hailiang's optimize patch:
From 0393031a16735835a441b6d6e0495a1bd14adb90 Mon Sep 17 00:00:00 2001
From: zhanghailiang <zhang.zhanghailiang@huawei.com>
Date: Mon, 24 Feb 2020 14:54:10 +0800
Subject: [PATCH] COLO: Optimize memory back-up process
This patch will reduce the downtime of VM for the initial process,
Previously, we copied all these memory in preparing stage of COLO
while we need to stop VM, which is a time-consuming process.
Here we optimize it by a trick, back-up every page while in migration
process while COLO is enabled, though it affects the speed of the
migration, but it obviously reduce the downtime of back-up all SVM'S
memory in COLO preparing stage.
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Message-Id: <20200224065414.36524-5-zhang.zhanghailiang@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
minor typo fixes
Hailiang, do you have time to look into it?
The detail log:
Primary node:
13322@1589511271.917346:colo_receive_message Receive 'checkpoint-ready' message
{"timestamp": {"seconds": 1589511271, "microseconds": 917406}, "event": "RESUME"}
13322@1589511271.917842:colo_vm_state_change Change 'stop' => 'run'
13322@1589511291.243346:colo_send_message Send 'checkpoint-request' message
13322@1589511291.243978:colo_receive_message Receive 'checkpoint-reply' message
{"timestamp": {"seconds": 1589511291, "microseconds": 244096}, "event": "STOP"}
13322@1589511291.244444:colo_vm_state_change Change 'run' => 'stop'
13322@1589511291.244561:colo_send_message Send 'vmstate-send' message
13322@1589511291.258594:colo_send_message Send 'vmstate-size' message
13322@1589511305.412479:colo_receive_message Receive 'vmstate-received' message
13322@1589511309.031826:colo_receive_message Receive 'vmstate-loaded' message
{"timestamp": {"seconds": 1589511309, "microseconds": 31862}, "event": "RESUME"}
13322@1589511309.033075:colo_vm_state_change Change 'stop' => 'run'
{"timestamp": {"seconds": 1589511311, "microseconds": 111617}, "event": "VNC_CONNECTED", "data": {"server": {"auth": "none", "family": "ipv4", "service": "5907", "host": "0.0.0.0", "websocket": false}, "client": {"family": "ipv4", "service": "51564", "host": "10.239.13.19", "websocket": false}}}
{"timestamp": {"seconds": 1589511311, "microseconds": 116197}, "event": "VNC_INITIALIZED", "data": {"server": {"auth": "none", "family": "ipv4", "service": "5907", "host": "0.0.0.0", "websocket": false}, "client": {"family": "ipv4", "service": "51564", "host": "10.239.13.19", "websocket": false}}}
13322@1589511311.243271:colo_send_message Send 'checkpoint-request' message
13322@1589511311.351361:colo_receive_message Receive 'checkpoint-reply' message
{"timestamp": {"seconds": 1589511311, "microseconds": 351439}, "event": "STOP"}
13322@1589511311.415779:colo_vm_state_change Change 'run' => 'stop'
13322@1589511311.416001:colo_send_message Send 'vmstate-send' message
13322@1589511311.418620:colo_send_message Send 'vmstate-size' message
Secondary node:
{"timestamp": {"seconds": 1589510920, "microseconds": 778207}, "event": "RESUME"}
23619@1589510920.778835:colo_vm_state_change Change 'stop' => 'run'
23619@1589510920.778891:colo_send_message Send 'checkpoint-ready' message
23619@1589510940.105539:colo_receive_message Receive 'checkpoint-request' message
{"timestamp": {"seconds": 1589510940, "microseconds": 105712}, "event": "STOP"}
23619@1589510940.105917:colo_vm_state_change Change 'run' => 'stop'
23619@1589510940.105971:colo_send_message Send 'checkpoint-reply' message
23619@1589510940.106767:colo_receive_message Receive 'vmstate-send' message
23619@1589510940.122808:colo_flush_ram_cache_begin dirty_pages 2456
23619@1589510953.618672:colo_flush_ram_cache_end
23619@1589510953.945083:colo_receive_message Receive 'vmstate-size' message
23619@1589510954.274816:colo_send_message Send 'vmstate-received' message
qemu-system-x86_64: warning: TSC frequency mismatch between VM (2792980 kHz) and host (2925999 kHz), and TSC scaling unavailable
{"timestamp": {"seconds": 1589510957, "microseconds": 754184}, "event": "RESUME"}
23619@1589510957.894113:colo_vm_state_change Change 'stop' => 'run'
23619@1589510957.894162:colo_send_message Send 'vmstate-loaded' message
23619@1589510960.105977:colo_receive_message Receive 'checkpoint-request' message
{"timestamp": {"seconds": 1589510960, "microseconds": 106148}, "event": "STOP"}
23619@1589510960.213773:colo_vm_state_change Change 'run' => 'stop'
23619@1589510960.213797:colo_send_message Send 'checkpoint-reply' message
23619@1589510960.278771:colo_receive_message Receive 'vmstate-send' message
23619@1589510960.423268:colo_flush_ram_cache_begin dirty_pages 25
[-- Attachment #2: Type: text/html, Size: 8840 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: About migration/colo issue
2020-05-15 3:16 About migration/colo issue Zhang, Chen
@ 2020-05-15 3:28 ` Zhanghailiang
2020-05-15 3:32 ` Zhang, Chen
2020-05-15 7:12 ` Lukas Straub
1 sibling, 1 reply; 5+ messages in thread
From: Zhanghailiang @ 2020-05-15 3:28 UTC (permalink / raw)
To: Zhang, Chen, Dr . David Alan Gilbert, qemu-devel, Li Zhijian
Cc: Jason Wang, Lukas Straub
[-- Attachment #1: Type: text/plain, Size: 7377 bytes --]
Hi Zhang Chen,
From your tracing log, it seems to be hanged in colo_flush_ram_cache()?
Does it come across a dead loop there ?
I'll test it by using the new qemu.
Thanks,
Hailiang
From: Zhang, Chen [mailto:chen.zhang@intel.com]
Sent: Friday, May 15, 2020 11:16 AM
To: Zhanghailiang <zhang.zhanghailiang@huawei.com>; Dr . David Alan Gilbert <dgilbert@redhat.com>; qemu-devel <qemu-devel@nongnu.org>; Li Zhijian <lizhijian@cn.fujitsu.com>
Cc: Jason Wang <jasowang@redhat.com>; Lukas Straub <lukasstraub2@web.de>
Subject: About migration/colo issue
Hi Hailiang/Dave.
I found a urgent problem in current upstream code, COLO will stuck on secondary checkpoint and later.
The guest will stuck by this issue.
I have bisect upstream code, this issue caused by Hailiang's optimize patch:
From 0393031a16735835a441b6d6e0495a1bd14adb90 Mon Sep 17 00:00:00 2001
From: zhanghailiang <zhang.zhanghailiang@huawei.com<mailto:zhang.zhanghailiang@huawei.com>>
Date: Mon, 24 Feb 2020 14:54:10 +0800
Subject: [PATCH] COLO: Optimize memory back-up process
This patch will reduce the downtime of VM for the initial process,
Previously, we copied all these memory in preparing stage of COLO
while we need to stop VM, which is a time-consuming process.
Here we optimize it by a trick, back-up every page while in migration
process while COLO is enabled, though it affects the speed of the
migration, but it obviously reduce the downtime of back-up all SVM'S
memory in COLO preparing stage.
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com<mailto:zhang.zhanghailiang@huawei.com>>
Message-Id: <20200224065414.36524-5-zhang.zhanghailiang@huawei.com<mailto:20200224065414.36524-5-zhang.zhanghailiang@huawei.com>>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com<mailto:dgilbert@redhat.com>>
minor typo fixes
Hailiang, do you have time to look into it?
The detail log:
Primary node:
13322@1589511271.917346:colo_receive_message<mailto:13322@1589511271.917346:colo_receive_message> Receive 'checkpoint-ready' message
{"timestamp": {"seconds": 1589511271, "microseconds": 917406}, "event": "RESUME"}
13322@1589511271.917842:colo_vm_state_change<mailto:13322@1589511271.917842:colo_vm_state_change> Change 'stop' => 'run'
13322@1589511291.243346:colo_send_message<mailto:13322@1589511291.243346:colo_send_message> Send 'checkpoint-request' message
13322@1589511291.243978:colo_receive_message<mailto:13322@1589511291.243978:colo_receive_message> Receive 'checkpoint-reply' message
{"timestamp": {"seconds": 1589511291, "microseconds": 244096}, "event": "STOP"}
13322@1589511291.244444:colo_vm_state_change<mailto:13322@1589511291.244444:colo_vm_state_change> Change 'run' => 'stop'
13322@1589511291.244561:colo_send_message<mailto:13322@1589511291.244561:colo_send_message> Send 'vmstate-send' message
13322@1589511291.258594:colo_send_message<mailto:13322@1589511291.258594:colo_send_message> Send 'vmstate-size' message
13322@1589511305.412479:colo_receive_message<mailto:13322@1589511305.412479:colo_receive_message> Receive 'vmstate-received' message
13322@1589511309.031826:colo_receive_message<mailto:13322@1589511309.031826:colo_receive_message> Receive 'vmstate-loaded' message
{"timestamp": {"seconds": 1589511309, "microseconds": 31862}, "event": "RESUME"}
13322@1589511309.033075:colo_vm_state_change<mailto:13322@1589511309.033075:colo_vm_state_change> Change 'stop' => 'run'
{"timestamp": {"seconds": 1589511311, "microseconds": 111617}, "event": "VNC_CONNECTED", "data": {"server": {"auth": "none", "family": "ipv4", "service": "5907", "host": "0.0.0.0", "websocket": false}, "client": {"family": "ipv4", "service": "51564", "host": "10.239.13.19", "websocket": false}}}
{"timestamp": {"seconds": 1589511311, "microseconds": 116197}, "event": "VNC_INITIALIZED", "data": {"server": {"auth": "none", "family": "ipv4", "service": "5907", "host": "0.0.0.0", "websocket": false}, "client": {"family": "ipv4", "service": "51564", "host": "10.239.13.19", "websocket": false}}}
13322@1589511311.243271:colo_send_message<mailto:13322@1589511311.243271:colo_send_message> Send 'checkpoint-request' message
13322@1589511311.351361:colo_receive_message<mailto:13322@1589511311.351361:colo_receive_message> Receive 'checkpoint-reply' message
{"timestamp": {"seconds": 1589511311, "microseconds": 351439}, "event": "STOP"}
13322@1589511311.415779:colo_vm_state_change<mailto:13322@1589511311.415779:colo_vm_state_change> Change 'run' => 'stop'
13322@1589511311.416001:colo_send_message<mailto:13322@1589511311.416001:colo_send_message> Send 'vmstate-send' message
13322@1589511311.418620:colo_send_message<mailto:13322@1589511311.418620:colo_send_message> Send 'vmstate-size' message
Secondary node:
{"timestamp": {"seconds": 1589510920, "microseconds": 778207}, "event": "RESUME"}
23619@1589510920.778835:colo_vm_state_change<mailto:23619@1589510920.778835:colo_vm_state_change> Change 'stop' => 'run'
23619@1589510920.778891:colo_send_message<mailto:23619@1589510920.778891:colo_send_message> Send 'checkpoint-ready' message
23619@1589510940.105539:colo_receive_message<mailto:23619@1589510940.105539:colo_receive_message> Receive 'checkpoint-request' message
{"timestamp": {"seconds": 1589510940, "microseconds": 105712}, "event": "STOP"}
23619@1589510940.105917:colo_vm_state_change<mailto:23619@1589510940.105917:colo_vm_state_change> Change 'run' => 'stop'
23619@1589510940.105971:colo_send_message<mailto:23619@1589510940.105971:colo_send_message> Send 'checkpoint-reply' message
23619@1589510940.106767:colo_receive_message<mailto:23619@1589510940.106767:colo_receive_message> Receive 'vmstate-send' message
23619@1589510940.122808:colo_flush_ram_cache_begin<mailto:23619@1589510940.122808:colo_flush_ram_cache_begin> dirty_pages 2456
23619@1589510953.618672:colo_flush_ram_cache_end<mailto:23619@1589510953.618672:colo_flush_ram_cache_end>
23619@1589510953.945083:colo_receive_message<mailto:23619@1589510953.945083:colo_receive_message> Receive 'vmstate-size' message
23619@1589510954.274816:colo_send_message<mailto:23619@1589510954.274816:colo_send_message> Send 'vmstate-received' message
qemu-system-x86_64: warning: TSC frequency mismatch between VM (2792980 kHz) and host (2925999 kHz), and TSC scaling unavailable
{"timestamp": {"seconds": 1589510957, "microseconds": 754184}, "event": "RESUME"}
23619@1589510957.894113:colo_vm_state_change<mailto:23619@1589510957.894113:colo_vm_state_change> Change 'stop' => 'run'
23619@1589510957.894162:colo_send_message<mailto:23619@1589510957.894162:colo_send_message> Send 'vmstate-loaded' message
23619@1589510960.105977:colo_receive_message<mailto:23619@1589510960.105977:colo_receive_message> Receive 'checkpoint-request' message
{"timestamp": {"seconds": 1589510960, "microseconds": 106148}, "event": "STOP"}
23619@1589510960.213773:colo_vm_state_change<mailto:23619@1589510960.213773:colo_vm_state_change> Change 'run' => 'stop'
23619@1589510960.213797:colo_send_message<mailto:23619@1589510960.213797:colo_send_message> Send 'checkpoint-reply' message
23619@1589510960.278771:colo_receive_message<mailto:23619@1589510960.278771:colo_receive_message> Receive 'vmstate-send' message
23619@1589510960.423268:colo_flush_ram_cache_begin<mailto:23619@1589510960.423268:colo_flush_ram_cache_begin> dirty_pages 25
[-- Attachment #2: Type: text/html, Size: 13801 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: About migration/colo issue
2020-05-15 3:28 ` Zhanghailiang
@ 2020-05-15 3:32 ` Zhang, Chen
0 siblings, 0 replies; 5+ messages in thread
From: Zhang, Chen @ 2020-05-15 3:32 UTC (permalink / raw)
To: Zhanghailiang, Dr . David Alan Gilbert, qemu-devel, Li Zhijian
Cc: Jason Wang, Lukas Straub
[-- Attachment #1: Type: text/plain, Size: 7986 bytes --]
From: Zhanghailiang <zhang.zhanghailiang@huawei.com>
Sent: Friday, May 15, 2020 11:29 AM
To: Zhang, Chen <chen.zhang@intel.com>; Dr . David Alan Gilbert <dgilbert@redhat.com>; qemu-devel <qemu-devel@nongnu.org>; Li Zhijian <lizhijian@cn.fujitsu.com>
Cc: Jason Wang <jasowang@redhat.com>; Lukas Straub <lukasstraub2@web.de>
Subject: RE: About migration/colo issue
Hi Zhang Chen,
From your tracing log, it seems to be hanged in colo_flush_ram_cache()?
Does it come across a dead loop there ?
Maybe, I haven't looked in depth.
I'll test it by using the new qemu.
Thanks
Thanks,
Hailiang
From: Zhang, Chen [mailto:chen.zhang@intel.com]
Sent: Friday, May 15, 2020 11:16 AM
To: Zhanghailiang <zhang.zhanghailiang@huawei.com<mailto:zhang.zhanghailiang@huawei.com>>; Dr . David Alan Gilbert <dgilbert@redhat.com<mailto:dgilbert@redhat.com>>; qemu-devel <qemu-devel@nongnu.org<mailto:qemu-devel@nongnu.org>>; Li Zhijian <lizhijian@cn.fujitsu.com<mailto:lizhijian@cn.fujitsu.com>>
Cc: Jason Wang <jasowang@redhat.com<mailto:jasowang@redhat.com>>; Lukas Straub <lukasstraub2@web.de<mailto:lukasstraub2@web.de>>
Subject: About migration/colo issue
Hi Hailiang/Dave.
I found a urgent problem in current upstream code, COLO will stuck on secondary checkpoint and later.
The guest will stuck by this issue.
I have bisect upstream code, this issue caused by Hailiang's optimize patch:
From 0393031a16735835a441b6d6e0495a1bd14adb90 Mon Sep 17 00:00:00 2001
From: zhanghailiang <zhang.zhanghailiang@huawei.com<mailto:zhang.zhanghailiang@huawei.com>>
Date: Mon, 24 Feb 2020 14:54:10 +0800
Subject: [PATCH] COLO: Optimize memory back-up process
This patch will reduce the downtime of VM for the initial process,
Previously, we copied all these memory in preparing stage of COLO
while we need to stop VM, which is a time-consuming process.
Here we optimize it by a trick, back-up every page while in migration
process while COLO is enabled, though it affects the speed of the
migration, but it obviously reduce the downtime of back-up all SVM'S
memory in COLO preparing stage.
Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com<mailto:zhang.zhanghailiang@huawei.com>>
Message-Id: <20200224065414.36524-5-zhang.zhanghailiang@huawei.com<mailto:20200224065414.36524-5-zhang.zhanghailiang@huawei.com>>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com<mailto:dgilbert@redhat.com>>
minor typo fixes
Hailiang, do you have time to look into it?
The detail log:
Primary node:
13322@1589511271.917346:colo_receive_message<mailto:13322@1589511271.917346:colo_receive_message> Receive 'checkpoint-ready' message
{"timestamp": {"seconds": 1589511271, "microseconds": 917406}, "event": "RESUME"}
13322@1589511271.917842:colo_vm_state_change<mailto:13322@1589511271.917842:colo_vm_state_change> Change 'stop' => 'run'
13322@1589511291.243346:colo_send_message<mailto:13322@1589511291.243346:colo_send_message> Send 'checkpoint-request' message
13322@1589511291.243978:colo_receive_message<mailto:13322@1589511291.243978:colo_receive_message> Receive 'checkpoint-reply' message
{"timestamp": {"seconds": 1589511291, "microseconds": 244096}, "event": "STOP"}
13322@1589511291.244444:colo_vm_state_change<mailto:13322@1589511291.244444:colo_vm_state_change> Change 'run' => 'stop'
13322@1589511291.244561:colo_send_message<mailto:13322@1589511291.244561:colo_send_message> Send 'vmstate-send' message
13322@1589511291.258594:colo_send_message<mailto:13322@1589511291.258594:colo_send_message> Send 'vmstate-size' message
13322@1589511305.412479:colo_receive_message<mailto:13322@1589511305.412479:colo_receive_message> Receive 'vmstate-received' message
13322@1589511309.031826:colo_receive_message<mailto:13322@1589511309.031826:colo_receive_message> Receive 'vmstate-loaded' message
{"timestamp": {"seconds": 1589511309, "microseconds": 31862}, "event": "RESUME"}
13322@1589511309.033075:colo_vm_state_change<mailto:13322@1589511309.033075:colo_vm_state_change> Change 'stop' => 'run'
{"timestamp": {"seconds": 1589511311, "microseconds": 111617}, "event": "VNC_CONNECTED", "data": {"server": {"auth": "none", "family": "ipv4", "service": "5907", "host": "0.0.0.0", "websocket": false}, "client": {"family": "ipv4", "service": "51564", "host": "10.239.13.19", "websocket": false}}}
{"timestamp": {"seconds": 1589511311, "microseconds": 116197}, "event": "VNC_INITIALIZED", "data": {"server": {"auth": "none", "family": "ipv4", "service": "5907", "host": "0.0.0.0", "websocket": false}, "client": {"family": "ipv4", "service": "51564", "host": "10.239.13.19", "websocket": false}}}
13322@1589511311.243271:colo_send_message<mailto:13322@1589511311.243271:colo_send_message> Send 'checkpoint-request' message
13322@1589511311.351361:colo_receive_message<mailto:13322@1589511311.351361:colo_receive_message> Receive 'checkpoint-reply' message
{"timestamp": {"seconds": 1589511311, "microseconds": 351439}, "event": "STOP"}
13322@1589511311.415779:colo_vm_state_change<mailto:13322@1589511311.415779:colo_vm_state_change> Change 'run' => 'stop'
13322@1589511311.416001:colo_send_message<mailto:13322@1589511311.416001:colo_send_message> Send 'vmstate-send' message
13322@1589511311.418620:colo_send_message<mailto:13322@1589511311.418620:colo_send_message> Send 'vmstate-size' message
Secondary node:
{"timestamp": {"seconds": 1589510920, "microseconds": 778207}, "event": "RESUME"}
23619@1589510920.778835:colo_vm_state_change<mailto:23619@1589510920.778835:colo_vm_state_change> Change 'stop' => 'run'
23619@1589510920.778891:colo_send_message<mailto:23619@1589510920.778891:colo_send_message> Send 'checkpoint-ready' message
23619@1589510940.105539:colo_receive_message<mailto:23619@1589510940.105539:colo_receive_message> Receive 'checkpoint-request' message
{"timestamp": {"seconds": 1589510940, "microseconds": 105712}, "event": "STOP"}
23619@1589510940.105917:colo_vm_state_change<mailto:23619@1589510940.105917:colo_vm_state_change> Change 'run' => 'stop'
23619@1589510940.105971:colo_send_message<mailto:23619@1589510940.105971:colo_send_message> Send 'checkpoint-reply' message
23619@1589510940.106767:colo_receive_message<mailto:23619@1589510940.106767:colo_receive_message> Receive 'vmstate-send' message
23619@1589510940.122808:colo_flush_ram_cache_begin<mailto:23619@1589510940.122808:colo_flush_ram_cache_begin> dirty_pages 2456
23619@1589510953.618672:colo_flush_ram_cache_end<mailto:23619@1589510953.618672:colo_flush_ram_cache_end>
23619@1589510953.945083:colo_receive_message<mailto:23619@1589510953.945083:colo_receive_message> Receive 'vmstate-size' message
23619@1589510954.274816:colo_send_message<mailto:23619@1589510954.274816:colo_send_message> Send 'vmstate-received' message
qemu-system-x86_64: warning: TSC frequency mismatch between VM (2792980 kHz) and host (2925999 kHz), and TSC scaling unavailable
{"timestamp": {"seconds": 1589510957, "microseconds": 754184}, "event": "RESUME"}
23619@1589510957.894113:colo_vm_state_change<mailto:23619@1589510957.894113:colo_vm_state_change> Change 'stop' => 'run'
23619@1589510957.894162:colo_send_message<mailto:23619@1589510957.894162:colo_send_message> Send 'vmstate-loaded' message
23619@1589510960.105977:colo_receive_message<mailto:23619@1589510960.105977:colo_receive_message> Receive 'checkpoint-request' message
{"timestamp": {"seconds": 1589510960, "microseconds": 106148}, "event": "STOP"}
23619@1589510960.213773:colo_vm_state_change<mailto:23619@1589510960.213773:colo_vm_state_change> Change 'run' => 'stop'
23619@1589510960.213797:colo_send_message<mailto:23619@1589510960.213797:colo_send_message> Send 'checkpoint-reply' message
23619@1589510960.278771:colo_receive_message<mailto:23619@1589510960.278771:colo_receive_message> Receive 'vmstate-send' message
23619@1589510960.423268:colo_flush_ram_cache_begin<mailto:23619@1589510960.423268:colo_flush_ram_cache_begin> dirty_pages 25
[-- Attachment #2: Type: text/html, Size: 14702 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: About migration/colo issue
2020-05-15 3:16 About migration/colo issue Zhang, Chen
2020-05-15 3:28 ` Zhanghailiang
@ 2020-05-15 7:12 ` Lukas Straub
2020-05-15 8:05 ` Zhanghailiang
1 sibling, 1 reply; 5+ messages in thread
From: Lukas Straub @ 2020-05-15 7:12 UTC (permalink / raw)
To: Zhang, Chen
Cc: qemu-devel, Jason Wang, zhanghailiang, Li Zhijian,
Dr . David Alan Gilbert
[-- Attachment #1: Type: text/plain, Size: 1423 bytes --]
On Fri, 15 May 2020 03:16:18 +0000
"Zhang, Chen" <chen.zhang@intel.com> wrote:
> Hi Hailiang/Dave.
>
> I found a urgent problem in current upstream code, COLO will stuck on secondary checkpoint and later.
> The guest will stuck by this issue.
> I have bisect upstream code, this issue caused by Hailiang's optimize patch:
Hmm, I'm on v5.0.0 (where that commit is in) and I don't have this issue in my testing.
Regards,
Lukas Straub
> From 0393031a16735835a441b6d6e0495a1bd14adb90 Mon Sep 17 00:00:00 2001
> From: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Date: Mon, 24 Feb 2020 14:54:10 +0800
> Subject: [PATCH] COLO: Optimize memory back-up process
>
> This patch will reduce the downtime of VM for the initial process,
> Previously, we copied all these memory in preparing stage of COLO
> while we need to stop VM, which is a time-consuming process.
> Here we optimize it by a trick, back-up every page while in migration
> process while COLO is enabled, though it affects the speed of the
> migration, but it obviously reduce the downtime of back-up all SVM'S
> memory in COLO preparing stage.
>
> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> Message-Id: <20200224065414.36524-5-zhang.zhanghailiang@huawei.com>
> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> minor typo fixes
>
> Hailiang, do you have time to look into it?
>
> ...
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: About migration/colo issue
2020-05-15 7:12 ` Lukas Straub
@ 2020-05-15 8:05 ` Zhanghailiang
0 siblings, 0 replies; 5+ messages in thread
From: Zhanghailiang @ 2020-05-15 8:05 UTC (permalink / raw)
To: Zhang, Chen
Cc: Jason Wang, Lukas Straub, Dr . David Alan Gilbert, Li Zhijian,
qemu-devel
Hi,
I can't reproduce this issue with the qemu upstream either,
It works well.
Did you use an old version ?
Thanks,
Hailiang
> -----Original Message-----
> From: Lukas Straub [mailto:lukasstraub2@web.de]
> Sent: Friday, May 15, 2020 3:12 PM
> To: Zhang, Chen <chen.zhang@intel.com>
> Cc: Zhanghailiang <zhang.zhanghailiang@huawei.com>; Dr . David Alan
> Gilbert <dgilbert@redhat.com>; qemu-devel <qemu-devel@nongnu.org>; Li
> Zhijian <lizhijian@cn.fujitsu.com>; Jason Wang <jasowang@redhat.com>
> Subject: Re: About migration/colo issue
>
> On Fri, 15 May 2020 03:16:18 +0000
> "Zhang, Chen" <chen.zhang@intel.com> wrote:
>
> > Hi Hailiang/Dave.
> >
> > I found a urgent problem in current upstream code, COLO will stuck on
> secondary checkpoint and later.
> > The guest will stuck by this issue.
> > I have bisect upstream code, this issue caused by Hailiang's optimize patch:
>
> Hmm, I'm on v5.0.0 (where that commit is in) and I don't have this issue in
> my testing.
>
> Regards,
> Lukas Straub
>
> > From 0393031a16735835a441b6d6e0495a1bd14adb90 Mon Sep 17
> 00:00:00 2001
> > From: zhanghailiang <zhang.zhanghailiang@huawei.com>
> > Date: Mon, 24 Feb 2020 14:54:10 +0800
> > Subject: [PATCH] COLO: Optimize memory back-up process
> >
> > This patch will reduce the downtime of VM for the initial process,
> > Previously, we copied all these memory in preparing stage of COLO
> > while we need to stop VM, which is a time-consuming process.
> > Here we optimize it by a trick, back-up every page while in migration
> > process while COLO is enabled, though it affects the speed of the
> > migration, but it obviously reduce the downtime of back-up all SVM'S
> > memory in COLO preparing stage.
> >
> > Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
> > Message-Id:
> <20200224065414.36524-5-zhang.zhanghailiang@huawei.com>
> > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > minor typo fixes
> >
> > Hailiang, do you have time to look into it?
> >
> > ...
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2020-05-15 8:06 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-15 3:16 About migration/colo issue Zhang, Chen
2020-05-15 3:28 ` Zhanghailiang
2020-05-15 3:32 ` Zhang, Chen
2020-05-15 7:12 ` Lukas Straub
2020-05-15 8:05 ` Zhanghailiang
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.