From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:48623) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bv0T2-0000D3-MV for qemu-devel@nongnu.org; Fri, 14 Oct 2016 07:16:03 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bv0Sz-0007kE-Ba for qemu-devel@nongnu.org; Fri, 14 Oct 2016 07:16:00 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58584) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bv0Sz-0007jW-1q for qemu-devel@nongnu.org; Fri, 14 Oct 2016 07:15:57 -0400 Date: Fri, 14 Oct 2016 12:15:48 +0100 From: "Dr. David Alan Gilbert" Message-ID: <20161014111548.GD2030@work-vm> References: <5feb15.7e53.1576070ae2d.Coremail.lichunguang@hust.edu.cn> <20160926112349.GF2029@work-vm> <13289d.86da.15766fdf27c.Coremail.lichunguang@hust.edu.cn> <20160930054610.GA1429@amit-lp.rh> <1401177.991b.157a34a9dea.Coremail.lichunguang@hust.edu.cn> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <1401177.991b.157a34a9dea.Coremail.lichunguang@hust.edu.cn> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] Migration dirty bitmap: should only mark pages as dirty after they have been sent List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Chunguang Li Cc: Amit Shah , qemu-devel@nongnu.org, pbonzini@redhat.com, stefanha@redhat.com, quintela@redhat.com * Chunguang Li (lichunguang@hust.edu.cn) wrote: >=20 >=20 >=20 > > -----=E5=8E=9F=E5=A7=8B=E9=82=AE=E4=BB=B6----- > > =E5=8F=91=E4=BB=B6=E4=BA=BA: "Amit Shah" > > =E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: 2016=E5=B9=B49=E6=9C=8830=E6=97= =A5 =E6=98=9F=E6=9C=9F=E4=BA=94 > > =E6=94=B6=E4=BB=B6=E4=BA=BA: "Chunguang Li" > > =E6=8A=84=E9=80=81: "Dr. David Alan Gilbert" , q= emu-devel@nongnu.org, pbonzini@redhat.com, stefanha@redhat.com, quintela@= redhat.com > > =E4=B8=BB=E9=A2=98: Re: Re: [Qemu-devel] Migration dirty bitmap: shou= ld only mark pages as dirty after they have been sent > >=20 > > On (Mon) 26 Sep 2016 [22:55:01], Chunguang Li wrote: > > >=20 > > >=20 > > >=20 > > > > -----=E5=8E=9F=E5=A7=8B=E9=82=AE=E4=BB=B6----- > > > > =E5=8F=91=E4=BB=B6=E4=BA=BA: "Dr. David Alan Gilbert" > > > > =E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: 2016=E5=B9=B49=E6=9C=8826=E6= =97=A5 =E6=98=9F=E6=9C=9F=E4=B8=80 > > > > =E6=94=B6=E4=BB=B6=E4=BA=BA: "Chunguang Li" > > > > =E6=8A=84=E9=80=81: qemu-devel@nongnu.org, amit.shah@redhat.com, = pbonzini@redhat.com, stefanha@redhat.com, quintela@redhat.com > > > > =E4=B8=BB=E9=A2=98: Re: [Qemu-devel] Migration dirty bitmap: shou= ld only mark pages as dirty after they have been sent > > > >=20 > > > > * Chunguang Li (lichunguang@hust.edu.cn) wrote: > > > > > Hi all! > > > > > I have some confusion about the dirty bitmap during migration. = I have digged into the code. I figure out that every now and then during = migration, the dirty bitmap will be grabbed from the kernel space through= ioctl(KVM_GET_DIRTY_LOG), and then be used to update qemu's dirty bitmap= . However I think this mechanism leads to resendness of some NON-dirty pa= ges. > > > > >=20 > > > > > Take the first iteration of precopy for instance, during which = all the pages will be sent. Before that during the migration setup, the i= octl(KVM_GET_DIRTY_LOG) is called once, so the kernel begins to produce t= he dirty bitmap from this moment. When the pages "that haven't been sent"= are written, the kernel space marks them as dirty. However I don't think= this is correct, because these pages will be sent during this and the ne= xt iterations with the same content (if they are not written again after = they are sent). It only makes sense to mark the pages which have already = been sent during one iteration as dirty when they are written. > > > > >=20 > > > > >=20 > > > > > Am I right about this consideration? If I am right, is there so= me advice to improve this? > > > >=20 > > > > I think you're right that this can happen; to clarify I think the > > > > case you're talking about is: > > > >=20 > > > > Iteration 1 > > > > sync bitmap > > > > start sending pages > > > > page 'n' is modified - but hasn't been sent yet > > > > page 'n' gets sent > > > > Iteration 2 > > > > sync bitmap > > > > 'page n is shown as modified' > > > > send page 'n' again > > > > > > >=20 > > > Yes=EF=BC=8Cthis is right the case I am talking about. > > > =20 > > > > So you're right that is wasteful; I guess it's more wasteful > > > > on big VMs with slow networks where the length of each iteration > > > > is large. > > >=20 > > > I think this is "very" wasteful. Assume the workload writes the pag= es dirty randomly within the guest address space, and the transfer speed = is constant. Intuitively, I think nearly half of the dirty pages produced= in Iteration 1 is not really dirty. This means the time of Iteration 2 i= s double of that to send only really dirty pages. > >=20 > > It makes sense, can you get some perf numbers to show what kinds of > > workloads get impacted the most? That would also help us to figure > > out what kinds of speed improvements we can expect. > >=20 > >=20 > > Amit >=20 > I have picked up 6 workloads and got the following statistics numbers=20 > of every iteration (except the last stop-copy one) during precopy. > These numbers are obtained with the basic precopy migration, without=20 > the capabilities like xbzrle or compression, etc. The network for the=20 > migration is exclusive, with a separate network for the workloads.=20 > They are both gigabit ethernet. I use qemu-2.5.1. >=20 > Three (booting, idle, web server) of them converged to the stop-copy ph= ase,=20 > with the given bandwidth and default downtime (300ms), while the other > three (kernel compilation, zeusmp, memcached) did not. >=20 > One page is "not-really-dirty", if it is written first and is sent late= r > (and not written again after that) during one iteration. I guess this=20 > would not happen so often during the other iterations as during the 1st= =20 > iteration. Because all the pages of the VM are sent to the dest node du= ring=20 > the 1st iteration, while during the others, only part of the pages are = sent.=20 > So I think the "not-really-dirty" pages should be produced mainly durin= g=20 > the 1st iteration , and maybe very little during the other iterations. >=20 > If we could avoid resending the "not-really-dirty" pages, intuitively, = I > think the time spent on Iteration 2 would be halved. This is a chain re= action, > because the dirty pages produced during Iteration 2 is halved, which in= curs > that the time spent on Iteration 3 is halved, then Iteration 4, 5... Yes; these numbers don't show how many of them are false dirty though. One problem is thinking about pages that have been redirtied, if the page= is dirtied after the sync but before the network write then it's the false-dirty tha= t you're describing. However, if the page is being written a few times, and so it would have b= een written after the network write then it isn't a false-dirty.=20 You might be able to figure that out with some kernel tracing of when the= dirtying happens, but it might be easier to write the fix! Dave > So I think "booting" and "kernel compilation" should benefit a lot fro= m this > improvement. The reason of "kernel compilation" would benefit is that s= ome=20 > iterations take around 600ms, and if they are halved into 300ms, then t= he precopy > may have the chance to step into stop and copy phase. >=20 > On the other hand, "idle" and "web server" would not benefit a lot, bec= ause > most of the time are spent on the 1st iteration and little on the other= s. >=20 > As to the "zeusmp" and "memcached", although the time spent on the othe= r iterations > but the 1st one may be halved, they still could not converge to stop an= d copy=20 > with the 300ms downtime. >=20 > --------------------1 vcpu, 1 GB ram, default bandwidth (32MB/s):------= ------------ >=20 > 1. booting : begin to migrate when the VM is booting >=20 > Iteration 1, duration: 6997 ms , transferred pages: 266450 (n: = 57269, d: 209181 ) , new dirty pages: 56414 , remaining dirty pages= : 56414 > Iteration 2, duration: 6497 ms , transferred pages: 54008 (n: = 52701, d: 1307 ) , new dirty pages: 48053 , remaining dirty pages= : 50459 > Iteration 3, duration: 5800 ms , transferred pages: 48232 (n: = 47444, d: 788 ) , new dirty pages: 9129 , remaining dirty pages= : 11356 > Iteration 4, duration: 1100 ms , transferred pages: 9091 (n: = 8998, d: 93 ) , new dirty pages: 165 , remaining dirty pages= : 2430 > Iteration 5, duration: 1 ms , transferred pages: 0 (n: = 0, d: 0 ) , new dirty pages: 0 , remaining dirty pages= : 2430 > (note: When the workload does converge, the output of the last iteratio= n is "fake". It just indicates that the precopy steps into stop-copy phas= e now. > "n" means "normal pages" and "d" means "duplicate (zero) pages".= ) >=20 > 2. idle >=20 > Iteration 1, duration: 14496 ms , transferred pages: 266450 (n: = 118980, d: 147470 ) , new dirty pages: 17398 , remaining dirty pages= : 17398 > Iteration 2, duration: 1896 ms , transferred pages: 14953 (n: = 14854, d: 99 ) , new dirty pages: 1849 , remaining dirty pages= : 4294 > Iteration 3, duration: 300 ms , transferred pages: 2454 (n: = 2454, d: 0 ) , new dirty pages: 9 , remaining dirty pages= : 1849 > Iteration 4, duration: 1 ms , transferred pages: 0 (n: = 0, d: 0 ) , new dirty pages: 0 , remaining dirty pages= : 1849 >=20 > 3. kernel compilation (can not converge) >=20 > Iteration 1, duration: 20700 ms , transferred pages: 266450 (n: = 169778, d: 96672 ) , new dirty pages: 40067 , remaining dirty pages= : 40067 > Iteration 2, duration: 4696 ms , transferred pages: 38401 (n: = 37787, d: 614 ) , new dirty pages: 8852 , remaining dirty pages= : 10518 > Iteration 3, duration: 1000 ms , transferred pages: 8642 (n: = 8180, d: 462 ) , new dirty pages: 6331 , remaining dirty pages= : 8207 > Iteration 4, duration: 700 ms , transferred pages: 6110 (n: = 5726, d: 384 ) , new dirty pages: 5242 , remaining dirty pages= : 7339 > Iteration 5, duration: 600 ms , transferred pages: 5007 (n: = 4908, d: 99 ) , new dirty pages: 4868 , remaining dirty pages= : 7200 > Iteration 6, duration: 600 ms , transferred pages: 5226 (n: = 4908, d: 318 ) , new dirty pages: 6142 , remaining dirty pages= : 8116 > Iteration 7, duration: 700 ms , transferred pages: 5985 (n: = 5726, d: 259 ) , new dirty pages: 5902 , remaining dirty pages= : 8033 > Iteration 8, duration: 701 ms , transferred pages: 5893 (n: = 5726, d: 167 ) , new dirty pages: 7502 , remaining dirty pages= : 9642 > Iteration 9, duration: 900 ms , transferred pages: 7623 (n: = 7362, d: 261 ) , new dirty pages: 6408 , remaining dirty pages= : 8427 > Iteration 10, duration: 700 ms , transferred pages: 6008 (n: = 5726, d: 282 ) , new dirty pages: 8312 , remaining dirty pages= : 10731 > Iteration 11, duration: 1000 ms , transferred pages: 8353 (n: = 8180, d: 173 ) , new dirty pages: 6874 , remaining dirty pages= : 9252 > Iteration 12, duration: 899 ms , transferred pages: 7477 (n: = 7362, d: 115 ) , new dirty pages: 5573 , remaining dirty pages= : 7348 > Iteration 13, duration: 601 ms , transferred pages: 5099 (n: = 4908, d: 191 ) , new dirty pages: 7671 , remaining dirty pages= : 9920 > Iteration 14, duration: 900 ms , transferred pages: 7586 (n: = 7362, d: 224 ) , new dirty pages: 7359 , remaining dirty pages= : 9693 > Iteration 15, duration: 900 ms , transferred pages: 7682 (n: = 7362, d: 320 ) , new dirty pages: 7371 , remaining dirty pages= : 9382 >=20 > 4. cpu2006.zeusmp (can not converge) >=20 > Iteration 1, duration: 21603 ms , transferred pages: 266450 (n: = 176660, d: 89790 ) , new dirty pages: 145625 , remaining dirty pages= : 145625 > Iteration 2, duration: 8696 ms , transferred pages: 144389 (n: = 70862, d: 73527 ) , new dirty pages: 125124 , remaining dirty pages= : 126360 > Iteration 3, duration: 6301 ms , transferred pages: 124057 (n: = 51379, d: 72678 ) , new dirty pages: 122528 , remaining dirty pages= : 124831 > Iteration 4, duration: 6400 ms , transferred pages: 124330 (n: = 52196, d: 72134 ) , new dirty pages: 124267 , remaining dirty pages= : 124768 > Iteration 5, duration: 6703 ms , transferred pages: 124034 (n: = 54656, d: 69378 ) , new dirty pages: 124151 , remaining dirty pages= : 124885 > Iteration 6, duration: 6703 ms , transferred pages: 124357 (n: = 54658, d: 69699 ) , new dirty pages: 124106 , remaining dirty pages= : 124634 > Iteration 7, duration: 6602 ms , transferred pages: 124568 (n: = 53838, d: 70730 ) , new dirty pages: 133828 , remaining dirty pages= : 133894 > Iteration 8, duration: 7600 ms , transferred pages: 133030 (n: = 62021, d: 71009 ) , new dirty pages: 126612 , remaining dirty pages= : 127476 > Iteration 9, duration: 7299 ms , transferred pages: 126511 (n: = 59569, d: 66942 ) , new dirty pages: 122727 , remaining dirty pages= : 123692 > Iteration 10, duration: 6609 ms , transferred pages: 123692 (n: = 54539, d: 69153 ) , new dirty pages: 122727 , remaining dirty pages= : 122727 > Iteration 11, duration: 6995 ms , transferred pages: 120347 (n: = 56423, d: 63924 ) , new dirty pages: 121430 , remaining dirty pages= : 123810 > Iteration 12, duration: 6703 ms , transferred pages: 123040 (n: = 54657, d: 68383 ) , new dirty pages: 122043 , remaining dirty pages= : 122813 > Iteration 13, duration: 7006 ms , transferred pages: 122353 (n: = 57121, d: 65232 ) , new dirty pages: 133869 , remaining dirty pages= : 134329 > Iteration 14, duration: 8209 ms , transferred pages: 132325 (n: = 66932, d: 65393 ) , new dirty pages: 126914 , remaining dirty pages= : 128918 > Iteration 15, duration: 7802 ms , transferred pages: 126931 (n: = 63671, d: 63260 ) , new dirty pages: 122351 , remaining dirty pages= : 124338 >=20 > 5. web server : An apache web server. The client is configured with 50 = concurrent connections. >=20 > Iteration 1, duration: 30697 ms , transferred pages: 266450 (n: = 251215, d: 15235 ) , new dirty pages: 30628 , remaining dirty pages= : 30628 > Iteration 2, duration: 3496 ms , transferred pages: 28859 (n: = 28513, d: 346 ) , new dirty pages: 5805 , remaining dirty pages= : 7574 > Iteration 3, duration: 701 ms , transferred pages: 5746 (n: = 5726, d: 20 ) , new dirty pages: 3433 , remaining dirty pages= : 5261 > Iteration 4, duration: 400 ms , transferred pages: 3281 (n: = 3272, d: 9 ) , new dirty pages: 1539 , remaining dirty pages= : 3519 > Iteration 5, duration: 199 ms , transferred pages: 1653 (n: = 1636, d: 17 ) , new dirty pages: 301 , remaining dirty pages= : 2167 > Iteration 6, duration: 1 ms , transferred pages: 0 (n: = 0, d: 0 ) , new dirty pages: 0 , remaining dirty pages= : 2167 >=20 > --------------------6 vcpu, 6 GB ram, max bandwidth (941.08 mbps):-----= ------------- >=20 > 6. memcached : 4 GB cache, memaslap: all write, concurrency =3D 5 (can= not converge) >=20 > Iteration 1, duration: 42486 ms , transferred pages: 1568087 (n: 1= 216079, d: 352008 ) , new dirty pages: 571940 , remaining dirty pages= : 581023 > Iteration 2, duration: 19774 ms , transferred pages: 571700 (n: = 567416, d: 4284 ) , new dirty pages: 331690 , remaining dirty pages= : 341013 > Iteration 3, duration: 11589 ms , transferred pages: 332187 (n: = 332095, d: 92 ) , new dirty pages: 222725 , remaining dirty pages= : 231551 > Iteration 4, duration: 7790 ms , transferred pages: 223571 (n: = 223499, d: 72 ) , new dirty pages: 157658 , remaining dirty pages= : 165638 > Iteration 5, duration: 5518 ms , transferred pages: 158056 (n: = 157998, d: 58 ) , new dirty pages: 128130 , remaining dirty pages= : 135712 > Iteration 6, duration: 4442 ms , transferred pages: 127764 (n: = 127701, d: 63 ) , new dirty pages: 104839 , remaining dirty pages= : 112787 > Iteration 7, duration: 3649 ms , transferred pages: 104581 (n: = 104523, d: 58 ) , new dirty pages: 100736 , remaining dirty pages= : 108942 > Iteration 8, duration: 3532 ms , transferred pages: 101379 (n: = 101315, d: 64 ) , new dirty pages: 87869 , remaining dirty pages= : 95432 > Iteration 9, duration: 3030 ms , transferred pages: 86841 (n: = 86786, d: 55 ) , new dirty pages: 77505 , remaining dirty pages= : 86096 > Iteration 10, duration: 2709 ms , transferred pages: 77875 (n: = 77814, d: 61 ) , new dirty pages: 77197 , remaining dirty pages= : 85418 > Iteration 11, duration: 2696 ms , transferred pages: 77107 (n: = 77044, d: 63 ) , new dirty pages: 65010 , remaining dirty pages= : 73321 > Iteration 12, duration: 2308 ms , transferred pages: 66540 (n: = 66484, d: 56 ) , new dirty pages: 64388 , remaining dirty pages= : 71169 > Iteration 13, duration: 2198 ms , transferred pages: 62953 (n: = 62897, d: 56 ) , new dirty pages: 62773 , remaining dirty pages= : 70989 > Iteration 14, duration: 2214 ms , transferred pages: 63466 (n: = 63411, d: 55 ) , new dirty pages: 67538 , remaining dirty pages= : 75061 > Iteration 15, duration: 2329 ms , transferred pages: 66924 (n: = 66875, d: 49 ) , new dirty pages: 63580 , remaining dirty pages= : 71717 > Iteration 16, duration: 2252 ms , transferred pages: 64554 (n: = 64539, d: 15 ) , new dirty pages: 63094 , remaining dirty pages= : 70257 > Iteration 17, duration: 2188 ms , transferred pages: 62697 (n: = 62641, d: 56 ) , new dirty pages: 63016 , remaining dirty pages= : 70576 > Iteration 18, duration: 2171 ms , transferred pages: 62377 (n: = 62322, d: 55 ) , new dirty pages: 56764 , remaining dirty pages= : 64963 > Iteration 19, duration: 2003 ms , transferred pages: 57382 (n: = 57324, d: 58 ) , new dirty pages: 65307 , remaining dirty pages= : 72888 > Iteration 20, duration: 2240 ms , transferred pages: 64426 (n: = 64364, d: 62 ) , new dirty pages: 61585 , remaining dirty pages= : 70047 >=20 >=20 > -- > Chunguang Li, Ph.D. Candidate > Wuhan National Laboratory for Optoelectronics (WNLO) > Huazhong University of Science & Technology (HUST) > Wuhan, Hubei Prov., China >=20 >=20 >=20 -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK