From mboxrd@z Thu Jan 1 00:00:00 1970 From: Avi Kivity Subject: Re: Checking guest memory pages changes from host userspace Date: Mon, 22 Jun 2009 14:38:01 +0300 Message-ID: <4A3F6D19.50609@redhat.com> References: <18C018878FB0244EB71B7FE328978A32679FD52B@rrsmsx503.amr.corp.intel.com> <4A3E5706.9070408@redhat.com> <3574F699-DC93-41EB-9ABC-F246CCE28203@suse.de> <4A3E9186.8020303@redhat.com> <4A3F45C1.4000201@redhat.com> <5BE1911C-4100-46ED-99A5-A57AAB256AA4@suse.de> <4A3F538B.1040607@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Cc: "Passera, Pablo R" , "kvm@vger.kernel.org" To: Alexander Graf Return-path: Received: from mx2.redhat.com ([66.187.237.31]:41180 "EHLO mx2.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754016AbZFVLhQ (ORCPT ); Mon, 22 Jun 2009 07:37:16 -0400 In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On 06/22/2009 12:57 PM, Alexander Graf wrote: >>> Yeah, the current implementation is probably the fastest you'll get. >>> I didn't want to slow down shadow page setup due to the dirty >>> update, but I guess compared to the rest of the overhead that >>> doesn't really weight as much. >> >> I didn't explain myself well, I now think using the dirty bits is better. >> >> Currently we do the following: >> 1. sweep all sptes to drop write permissions > > sweep = flush / remove from spt? sweep = iterate over all (dropping write permissions from each spte) >> 2. on write faults, mark the page dirty >> 3. retrieve the log >> >> We could do instead: >> 1. sweep all sptes to drop the dirty bit > > sweep = modify pte to set dirty=0? sweep = iterate over all (dropping dirty bits) >> 2. on writes, set the dirty bit (the cpu does this) >> 3. sweep all sptes to read the dirty bit, and return the log >> >> Since step 1 occurs after step 3 of the previous iteration, we could >> merge them, and lose nothing. > > Hm - so in both cases we need to loop through all PTEs anyways, > because we need to either remove/unset dirty them? Yes. Although for the write-protect case, we could alternatively look at the bitmap to see which sptes we need to drop. > > Then it really does make sense to use the dirty bit :-). > Also doing a #vmexit is rather expensive, so I'd rather loop through > 1000 entries in the host context than taking 10 #vmexits. And dirty > bits don't #vmexit. It's not that trivial. A #vmexit is about 2000 cycles (including mmu code), while a cache miss is 100-200 cycles. So is we don't scan the sptes carefully, the cache miss cost could be greater. > Maybe it'd make sense to use the higher order PTE dirty bits too (do > they have dirty bits on x86?) to not loop through all PTEs to generate > the dirty map. In most cases it'll be 0 anyways. There are no higher dirty bits, but we can write protect the higher level. I'm not sure it's worthwhile; if 1% of memory is dirty, but it's scattered randomly, then all 2MB ranges will be dirty. > That way we'd save 90% of the loop time, because we only need to check > a couple of 2/4mb pte entries. You have a 4MB guest? Okay, you're only considering the vga tracking. I don't think that's a problem in practice, worst case is a few hundred faults in a 30ms time period. -- error compiling committee.c: too many arguments to function