From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55350) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bpqeM-0006zK-Ee for qemu-devel@nongnu.org; Fri, 30 Sep 2016 01:46:23 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bpqeI-0002TG-9Y for qemu-devel@nongnu.org; Fri, 30 Sep 2016 01:46:21 -0400 Received: from mx1.redhat.com ([209.132.183.28]:56622) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bpqeI-0002TA-0T for qemu-devel@nongnu.org; Fri, 30 Sep 2016 01:46:18 -0400 Date: Fri, 30 Sep 2016 11:16:10 +0530 From: Amit Shah Message-ID: <20160930054610.GA1429@amit-lp.rh> References: <5feb15.7e53.1576070ae2d.Coremail.lichunguang@hust.edu.cn> <20160926112349.GF2029@work-vm> <13289d.86da.15766fdf27c.Coremail.lichunguang@hust.edu.cn> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <13289d.86da.15766fdf27c.Coremail.lichunguang@hust.edu.cn> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] Migration dirty bitmap: should only mark pages as dirty after they have been sent List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Chunguang Li Cc: "Dr. David Alan Gilbert" , qemu-devel@nongnu.org, pbonzini@redhat.com, stefanha@redhat.com, quintela@redhat.com On (Mon) 26 Sep 2016 [22:55:01], Chunguang Li wrote: >=20 >=20 >=20 > > -----=E5=8E=9F=E5=A7=8B=E9=82=AE=E4=BB=B6----- > > =E5=8F=91=E4=BB=B6=E4=BA=BA: "Dr. David Alan Gilbert" > > =E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: 2016=E5=B9=B49=E6=9C=8826=E6=97= =A5 =E6=98=9F=E6=9C=9F=E4=B8=80 > > =E6=94=B6=E4=BB=B6=E4=BA=BA: "Chunguang Li" > > =E6=8A=84=E9=80=81: qemu-devel@nongnu.org, amit.shah@redhat.com, pbon= zini@redhat.com, stefanha@redhat.com, quintela@redhat.com > > =E4=B8=BB=E9=A2=98: Re: [Qemu-devel] Migration dirty bitmap: should o= nly mark pages as dirty after they have been sent > >=20 > > * Chunguang Li (lichunguang@hust.edu.cn) wrote: > > > Hi all! > > > I have some confusion about the dirty bitmap during migration. I ha= ve digged into the code. I figure out that every now and then during migr= ation, the dirty bitmap will be grabbed from the kernel space through ioc= tl(KVM_GET_DIRTY_LOG), and then be used to update qemu's dirty bitmap. Ho= wever I think this mechanism leads to resendness of some NON-dirty pages. > > >=20 > > > Take the first iteration of precopy for instance, during which all = the pages will be sent. Before that during the migration setup, the ioctl= (KVM_GET_DIRTY_LOG) is called once, so the kernel begins to produce the d= irty bitmap from this moment. When the pages "that haven't been sent" are= written, the kernel space marks them as dirty. However I don't think thi= s is correct, because these pages will be sent during this and the next i= terations with the same content (if they are not written again after they= are sent). It only makes sense to mark the pages which have already been= sent during one iteration as dirty when they are written. > > >=20 > > >=20 > > > Am I right about this consideration? If I am right, is there some a= dvice to improve this? > >=20 > > I think you're right that this can happen; to clarify I think the > > case you're talking about is: > >=20 > > Iteration 1 > > sync bitmap > > start sending pages > > page 'n' is modified - but hasn't been sent yet > > page 'n' gets sent > > Iteration 2 > > sync bitmap > > 'page n is shown as modified' > > send page 'n' again > > >=20 > Yes=EF=BC=8Cthis is right the case I am talking about. > =20 > > So you're right that is wasteful; I guess it's more wasteful > > on big VMs with slow networks where the length of each iteration > > is large. >=20 > I think this is "very" wasteful. Assume the workload writes the pages d= irty randomly within the guest address space, and the transfer speed is c= onstant. Intuitively, I think nearly half of the dirty pages produced in = Iteration 1 is not really dirty. This means the time of Iteration 2 is do= uble of that to send only really dirty pages. It makes sense, can you get some perf numbers to show what kinds of workloads get impacted the most? That would also help us to figure out what kinds of speed improvements we can expect. Amit