RE: [PATCH] KVM: x86: enable dirty log gradually in small chunks

From: "Zhoujian (jay)" <jianjay.zhou@huawei.com>
To: Peter Xu <peterx@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"wangxin (U)" <wangxinxin.wang@huawei.com>,
	"linfeng (M)" <linfeng23@huawei.com>,
	"Huangweidong (C)" <weidong.huang@huawei.com>,
	"Liujinsong (Paul)" <liu.jinsong@huawei.com>
Subject: RE: [PATCH] KVM: x86: enable dirty log gradually in small chunks
Date: Wed, 19 Feb 2020 04:11:48 +0000	[thread overview]
Message-ID: <B2D15215269B544CADD246097EACE7474BAFEBB5@DGGEMM528-MBX.china.huawei.com> (raw)
In-Reply-To: <20200218172627.GD1408806@xz-x1>

Hi Peter,

> -----Original Message-----
> From: Peter Xu [mailto:peterx@redhat.com]
> Sent: Wednesday, February 19, 2020 1:26 AM
> To: Zhoujian (jay) <jianjay.zhou@huawei.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>; kvm@vger.kernel.org; wangxin (U)
> <wangxinxin.wang@huawei.com>; linfeng (M) <linfeng23@huawei.com>;
> Huangweidong (C) <weidong.huang@huawei.com>; Liujinsong (Paul)
> <liu.jinsong@huawei.com>
> Subject: Re: [PATCH] KVM: x86: enable dirty log gradually in small chunks
> 
> On Tue, Feb 18, 2020 at 01:39:36PM +0000, Zhoujian (jay) wrote:
> > Hi Paolo,
> >
> > > -----Original Message-----
> > > From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> > > Sent: Tuesday, February 18, 2020 7:40 PM
> > > To: Zhoujian (jay) <jianjay.zhou@huawei.com>; kvm@vger.kernel.org
> > > Cc: peterx@redhat.com; wangxin (U) <wangxinxin.wang@huawei.com>;
> > > linfeng (M) <linfeng23@huawei.com>; Huangweidong (C)
> > > <weidong.huang@huawei.com>
> > > Subject: Re: [PATCH] KVM: x86: enable dirty log gradually in small
> > > chunks
> > >
> > > On 18/02/20 12:00, Jay Zhou wrote:
> > > > It could take kvm->mmu_lock for an extended period of time when
> > > > enabling dirty log for the first time. The main cost is to clear
> > > > all the D-bits of last level SPTEs. This situation can benefit
> > > > from manual dirty log protect as well, which can reduce the mmu_lock
> time taken.
> > > > The sequence is like this:
> > > >
> > > > 1. Set all the bits of the first dirty bitmap to 1 when enabling
> > > >    dirty log for the first time
> > > > 2. Only write protect the huge pages 3. KVM_GET_DIRTY_LOG returns
> > > > the dirty bitmap info 4.
> > > > KVM_CLEAR_DIRTY_LOG will clear D-bit for each of the leaf level
> > > >    SPTEs gradually in small chunks
> > > >
> > > > Under the Intel(R) Xeon(R) Gold 6152 CPU @ 2.10GHz environment, I
> > > > did some tests with a 128G windows VM and counted the time taken
> > > > of memory_global_dirty_log_start, here is the numbers:
> > > >
> > > > VM Size        Before    After optimization
> > > > 128G           460ms     10ms
> > >
> > > This is a good idea, but could userspace expect the bitmap to be 0
> > > for pages that haven't been touched?
> >
> > The userspace gets the bitmap information only from the kernel side.
> > It depends on the kernel side to distinguish whether the pages have
> > been touched I think, which using the rmap to traverse for now. I
> > haven't the other ideas yet, :-(
> >
> > But even though the userspace gets 1 for pages that haven't been
> > touched, these pages will be filtered out too in the kernel space
> > KVM_CLEAR_DIRTY_LOG ioctl path, since the rmap does not exist I think.
> >
> > > I think this should be added as a new bit to the KVM_ENABLE_CAP for
> > > KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2.  That is:
> > >
> > > - in kvm_vm_ioctl_check_extension_generic, return 3 for
> > > KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 (better: define two constants
> > > KVM_DIRTY_LOG_MANUAL_PROTECT as 1 and
> KVM_DIRTY_LOG_INITIALLY_SET as
> > > 2).
> > >
> > > - in kvm_vm_ioctl_enable_cap_generic, allow bit 0 and bit 1 for
> > > cap->args[0]
> > >
> > > - in kvm_vm_ioctl_enable_cap_generic, check "if
> > > (!(kvm->manual_dirty_log_protect & KVM_DIRTY_LOG_INITIALLY_SET))".
> >
> > Thanks for the details! I'll add them in the next version.
> 
> I agree with Paolo that we'd better introduce a new bit for the change, because
> we don't know whether userspace has the assumption with a zeroed dirty
> bitmap as initial state (which is still part of the kernel ABI IIUC, actually that
> could be a good thing for some userspace).
> 
> Another question is that I see you only modified the PML path.  Could this also
> benefit the rest (say, SPTE write protects)?

Oh I missed the other path, thanks for reminding, I'll add it in V2.

Regards,
Jay Zhou

> 
> Thanks,
> 
> --
> Peter Xu