All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Zhang, Chen" <chen.zhang@intel.com>
To: Zhanghailiang <zhang.zhanghailiang@huawei.com>,
	"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
	qemu-devel <qemu-devel@nongnu.org>,
	Li Zhijian <lizhijian@cn.fujitsu.com>
Cc: Jason Wang <jasowang@redhat.com>, Lukas Straub <lukasstraub2@web.de>
Subject: RE: About migration/colo issue
Date: Fri, 15 May 2020 03:32:42 +0000	[thread overview]
Message-ID: <0c74f952a56442309cca6d0c7d2dde60@intel.com> (raw)
In-Reply-To: <02d979c3d0004d07abed10bb6ddeba26@huawei.com>

[-- Attachment #1: Type: text/plain, Size: 7986 bytes --]



From: Zhanghailiang <zhang.zhanghailiang@huawei.com>
Sent: Friday, May 15, 2020 11:29 AM
To: Zhang, Chen <chen.zhang@intel.com>; Dr . David Alan Gilbert <dgilbert@redhat.com>; qemu-devel <qemu-devel@nongnu.org>; Li Zhijian <lizhijian@cn.fujitsu.com>
Cc: Jason Wang <jasowang@redhat.com>; Lukas Straub <lukasstraub2@web.de>
Subject: RE: About migration/colo issue

Hi Zhang Chen,

From your tracing log, it seems to be hanged in colo_flush_ram_cache()?
Does it come across a dead loop there ?

Maybe, I haven't looked in depth.

I'll test it by using the new qemu.

Thanks

Thanks,
Hailiang

From: Zhang, Chen [mailto:chen.zhang@intel.com]
Sent: Friday, May 15, 2020 11:16 AM
To: Zhanghailiang <zhang.zhanghailiang@huawei.com<mailto:zhang.zhanghailiang@huawei.com>>; Dr . David Alan Gilbert <dgilbert@redhat.com<mailto:dgilbert@redhat.com>>; qemu-devel <qemu-devel@nongnu.org<mailto:qemu-devel@nongnu.org>>; Li Zhijian <lizhijian@cn.fujitsu.com<mailto:lizhijian@cn.fujitsu.com>>
Cc: Jason Wang <jasowang@redhat.com<mailto:jasowang@redhat.com>>; Lukas Straub <lukasstraub2@web.de<mailto:lukasstraub2@web.de>>
Subject: About migration/colo issue

Hi Hailiang/Dave.

I found a urgent problem in current upstream code, COLO will stuck on secondary checkpoint and later.
The guest will stuck by this issue.
I have bisect upstream code, this issue caused by Hailiang's optimize patch:

From 0393031a16735835a441b6d6e0495a1bd14adb90 Mon Sep 17 00:00:00 2001
From: zhanghailiang <zhang.zhanghailiang@huawei.com<mailto:zhang.zhanghailiang@huawei.com>>
Date: Mon, 24 Feb 2020 14:54:10 +0800
Subject: [PATCH] COLO: Optimize memory back-up process

This patch will reduce the downtime of VM for the initial process,
Previously, we copied all these memory in preparing stage of COLO
while we need to stop VM, which is a time-consuming process.
Here we optimize it by a trick, back-up every page while in migration
process while COLO is enabled, though it affects the speed of the
migration, but it obviously reduce the downtime of back-up all SVM'S
memory in COLO preparing stage.

Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com<mailto:zhang.zhanghailiang@huawei.com>>
Message-Id: <20200224065414.36524-5-zhang.zhanghailiang@huawei.com<mailto:20200224065414.36524-5-zhang.zhanghailiang@huawei.com>>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com<mailto:dgilbert@redhat.com>>
  minor typo fixes

Hailiang, do you have time to look into it?


The detail log:
Primary node:
13322@1589511271.917346:colo_receive_message<mailto:13322@1589511271.917346:colo_receive_message> Receive 'checkpoint-ready' message
{"timestamp": {"seconds": 1589511271, "microseconds": 917406}, "event": "RESUME"}
13322@1589511271.917842:colo_vm_state_change<mailto:13322@1589511271.917842:colo_vm_state_change> Change 'stop' => 'run'
13322@1589511291.243346:colo_send_message<mailto:13322@1589511291.243346:colo_send_message> Send 'checkpoint-request' message
13322@1589511291.243978:colo_receive_message<mailto:13322@1589511291.243978:colo_receive_message> Receive 'checkpoint-reply' message
{"timestamp": {"seconds": 1589511291, "microseconds": 244096}, "event": "STOP"}
13322@1589511291.244444:colo_vm_state_change<mailto:13322@1589511291.244444:colo_vm_state_change> Change 'run' => 'stop'
13322@1589511291.244561:colo_send_message<mailto:13322@1589511291.244561:colo_send_message> Send 'vmstate-send' message
13322@1589511291.258594:colo_send_message<mailto:13322@1589511291.258594:colo_send_message> Send 'vmstate-size' message
13322@1589511305.412479:colo_receive_message<mailto:13322@1589511305.412479:colo_receive_message> Receive 'vmstate-received' message
13322@1589511309.031826:colo_receive_message<mailto:13322@1589511309.031826:colo_receive_message> Receive 'vmstate-loaded' message
{"timestamp": {"seconds": 1589511309, "microseconds": 31862}, "event": "RESUME"}
13322@1589511309.033075:colo_vm_state_change<mailto:13322@1589511309.033075:colo_vm_state_change> Change 'stop' => 'run'
{"timestamp": {"seconds": 1589511311, "microseconds": 111617}, "event": "VNC_CONNECTED", "data": {"server": {"auth": "none", "family": "ipv4", "service": "5907", "host": "0.0.0.0", "websocket": false}, "client": {"family": "ipv4", "service": "51564", "host": "10.239.13.19", "websocket": false}}}
{"timestamp": {"seconds": 1589511311, "microseconds": 116197}, "event": "VNC_INITIALIZED", "data": {"server": {"auth": "none", "family": "ipv4", "service": "5907", "host": "0.0.0.0", "websocket": false}, "client": {"family": "ipv4", "service": "51564", "host": "10.239.13.19", "websocket": false}}}
13322@1589511311.243271:colo_send_message<mailto:13322@1589511311.243271:colo_send_message> Send 'checkpoint-request' message
13322@1589511311.351361:colo_receive_message<mailto:13322@1589511311.351361:colo_receive_message> Receive 'checkpoint-reply' message
{"timestamp": {"seconds": 1589511311, "microseconds": 351439}, "event": "STOP"}
13322@1589511311.415779:colo_vm_state_change<mailto:13322@1589511311.415779:colo_vm_state_change> Change 'run' => 'stop'
13322@1589511311.416001:colo_send_message<mailto:13322@1589511311.416001:colo_send_message> Send 'vmstate-send' message
13322@1589511311.418620:colo_send_message<mailto:13322@1589511311.418620:colo_send_message> Send 'vmstate-size' message

Secondary node:
{"timestamp": {"seconds": 1589510920, "microseconds": 778207}, "event": "RESUME"}
23619@1589510920.778835:colo_vm_state_change<mailto:23619@1589510920.778835:colo_vm_state_change> Change 'stop' => 'run'
23619@1589510920.778891:colo_send_message<mailto:23619@1589510920.778891:colo_send_message> Send 'checkpoint-ready' message
23619@1589510940.105539:colo_receive_message<mailto:23619@1589510940.105539:colo_receive_message> Receive 'checkpoint-request' message
{"timestamp": {"seconds": 1589510940, "microseconds": 105712}, "event": "STOP"}
23619@1589510940.105917:colo_vm_state_change<mailto:23619@1589510940.105917:colo_vm_state_change> Change 'run' => 'stop'
23619@1589510940.105971:colo_send_message<mailto:23619@1589510940.105971:colo_send_message> Send 'checkpoint-reply' message
23619@1589510940.106767:colo_receive_message<mailto:23619@1589510940.106767:colo_receive_message> Receive 'vmstate-send' message
23619@1589510940.122808:colo_flush_ram_cache_begin<mailto:23619@1589510940.122808:colo_flush_ram_cache_begin> dirty_pages 2456
23619@1589510953.618672:colo_flush_ram_cache_end<mailto:23619@1589510953.618672:colo_flush_ram_cache_end>
23619@1589510953.945083:colo_receive_message<mailto:23619@1589510953.945083:colo_receive_message> Receive 'vmstate-size' message
23619@1589510954.274816:colo_send_message<mailto:23619@1589510954.274816:colo_send_message> Send 'vmstate-received' message
qemu-system-x86_64: warning: TSC frequency mismatch between VM (2792980 kHz) and host (2925999 kHz), and TSC scaling unavailable
{"timestamp": {"seconds": 1589510957, "microseconds": 754184}, "event": "RESUME"}
23619@1589510957.894113:colo_vm_state_change<mailto:23619@1589510957.894113:colo_vm_state_change> Change 'stop' => 'run'
23619@1589510957.894162:colo_send_message<mailto:23619@1589510957.894162:colo_send_message> Send 'vmstate-loaded' message
23619@1589510960.105977:colo_receive_message<mailto:23619@1589510960.105977:colo_receive_message> Receive 'checkpoint-request' message
{"timestamp": {"seconds": 1589510960, "microseconds": 106148}, "event": "STOP"}
23619@1589510960.213773:colo_vm_state_change<mailto:23619@1589510960.213773:colo_vm_state_change> Change 'run' => 'stop'
23619@1589510960.213797:colo_send_message<mailto:23619@1589510960.213797:colo_send_message> Send 'checkpoint-reply' message
23619@1589510960.278771:colo_receive_message<mailto:23619@1589510960.278771:colo_receive_message> Receive 'vmstate-send' message
23619@1589510960.423268:colo_flush_ram_cache_begin<mailto:23619@1589510960.423268:colo_flush_ram_cache_begin> dirty_pages 25








[-- Attachment #2: Type: text/html, Size: 14702 bytes --]

  reply	other threads:[~2020-05-15  3:33 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-15  3:16 About migration/colo issue Zhang, Chen
2020-05-15  3:28 ` Zhanghailiang
2020-05-15  3:32   ` Zhang, Chen [this message]
2020-05-15  7:12 ` Lukas Straub
2020-05-15  8:05   ` Zhanghailiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0c74f952a56442309cca6d0c7d2dde60@intel.com \
    --to=chen.zhang@intel.com \
    --cc=dgilbert@redhat.com \
    --cc=jasowang@redhat.com \
    --cc=lizhijian@cn.fujitsu.com \
    --cc=lukasstraub2@web.de \
    --cc=qemu-devel@nongnu.org \
    --cc=zhang.zhanghailiang@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.