From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53645) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cwqFd-0005GY-4A for qemu-devel@nongnu.org; Sat, 08 Apr 2017 09:18:02 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cwqFb-0002Vc-Vz for qemu-devel@nongnu.org; Sat, 08 Apr 2017 09:18:01 -0400 MIME-Version: 1.0 In-Reply-To: <20170407113450.GO13602@stefanha-x1.localdomain> References: <1491384478-12325-1-git-send-email-lidongchen@tencent.com> <20170406140255.GA31259@stefanha-x1.localdomain> <20170407113450.GO13602@stefanha-x1.localdomain> From: 858585 jemmy Date: Sat, 8 Apr 2017 21:17:58 +0800 Message-ID: Content-Type: text/plain; charset=UTF-8 Subject: Re: [Qemu-devel] [PATCH v3] migration/block:limit the time used for block migration List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: qemu-devel@nongnu.org, Fam Zheng , quintela@redhat.com, dgilbert@redhat.com, qemu-block@nongnu.org, Lidong Chen On Fri, Apr 7, 2017 at 7:34 PM, Stefan Hajnoczi wrote: > On Fri, Apr 07, 2017 at 09:30:33AM +0800, 858585 jemmy wrote: >> On Thu, Apr 6, 2017 at 10:02 PM, Stefan Hajnoczi wrote: >> > On Wed, Apr 05, 2017 at 05:27:58PM +0800, jemmy858585@gmail.com wrote: >> > >> > A proper solution is to refactor the synchronous code to make it >> > asynchronous. This might require invoking the system call from a >> > thread pool worker. >> > >> >> yes, i agree with you, but this is a big change. >> I will try to find how to optimize this code, maybe need a long time. >> >> this patch is not a perfect solution, but can alleviate the problem. > > Let's try to understand the problem fully first. > when migrate the vm with high speed, i find vnc response slowly sometime. not only vnc response slowly, virsh console aslo response slowly sometime. and the guest os block io performance is also reduce. the bug can be reproduce by this command: virsh migrate-setspeed 165cf436-312f-47e7-90f2-f8aa63f34893 900 virsh migrate --live 165cf436-312f-47e7-90f2-f8aa63f34893 --copy-storage-inc qemu+ssh://10.59.163.38/system and --copy-storage-all have no problem. virsh migrate --live 165cf436-312f-47e7-90f2-f8aa63f34893 --copy-storage-all qemu+ssh://10.59.163.38/system compare the difference between --copy-storage-inc and --copy-storage-all. i find out the reason is mig_save_device_bulk invoke bdrv_is_allocated, but bdrv_is_allocated is synchronous and maybe wait for a long time. i write this code to measure the time used by brdrv_is_allocated() 279 static int max_time = 0; 280 int tmp; 288 clock_gettime(CLOCK_MONOTONIC_RAW, &ts1); 289 ret = bdrv_is_allocated(blk_bs(bb), cur_sector, 290 MAX_IS_ALLOCATED_SEARCH, &nr_sectors); 291 clock_gettime(CLOCK_MONOTONIC_RAW, &ts2); 292 293 294 tmp = (ts2.tv_sec - ts1.tv_sec)*1000000000L 295 + (ts2.tv_nsec - ts1.tv_nsec); 296 if (tmp > max_time) { 297 max_time=tmp; 298 fprintf(stderr, "max_time is %d\n", max_time); 299 } the test result is below: max_time is 37014 max_time is 1075534 max_time is 17180913 max_time is 28586762 max_time is 49563584 max_time is 103085447 max_time is 110836833 max_time is 120331438 bdrv_is_allocated is called after qemu_mutex_lock_iothread. and the main thread is also call qemu_mutex_lock_iothread. so cause the main thread maybe wait for a long time. if (bmds->shared_base) { qemu_mutex_lock_iothread(); aio_context_acquire(blk_get_aio_context(bb)); /* Skip unallocated sectors; intentionally treats failure as * an allocated sector */ while (cur_sector < total_sectors && !bdrv_is_allocated(blk_bs(bb), cur_sector, MAX_IS_ALLOCATED_SEARCH, &nr_sectors)) { cur_sector += nr_sectors; } aio_context_release(blk_get_aio_context(bb)); qemu_mutex_unlock_iothread(); } #0 0x00007f107322f264 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f107322a508 in _L_lock_854 () from /lib64/libpthread.so.0 #2 0x00007f107322a3d7 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x0000000000949ecb in qemu_mutex_lock (mutex=0xfc51a0) at util/qemu-thread-posix.c:60 #4 0x0000000000459e58 in qemu_mutex_lock_iothread () at /root/qemu/cpus.c:1516 #5 0x0000000000945322 in os_host_main_loop_wait (timeout=28911939) at util/main-loop.c:258 #6 0x00000000009453f2 in main_loop_wait (nonblocking=0) at util/main-loop.c:517 #7 0x00000000005c76b4 in main_loop () at vl.c:1898 #8 0x00000000005ceb77 in main (argc=49, argv=0x7fff921182b8, envp=0x7fff92118448) at vl.c:4709 > Stefan