From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:39442) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UJ5iz-0004QQ-4L for qemu-devel@nongnu.org; Fri, 22 Mar 2013 13:25:55 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UJ5ix-0003bn-ME for qemu-devel@nongnu.org; Fri, 22 Mar 2013 13:25:53 -0400 Received: from mx1.redhat.com ([209.132.183.28]:18538) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UJ5ix-0003aa-Dq for qemu-devel@nongnu.org; Fri, 22 Mar 2013 13:25:51 -0400 Message-ID: <514C9413.8090902@redhat.com> Date: Fri, 22 Mar 2013 18:25:39 +0100 From: Paolo Bonzini MIME-Version: 1.0 References: <1363956370-23681-1-git-send-email-pl@kamp.de> In-Reply-To: <1363956370-23681-1-git-send-email-pl@kamp.de> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Lieven Cc: Orit Wasserman , Stefan Hajnoczi , qemu-devel@nongnu.org, quintela@redhat.com Il 22/03/2013 13:46, Peter Lieven ha scritto: > this is v4 of my patch series with various optimizations in > zero buffer checking and migration tweaks. > > thanks especially to Eric Blake for reviewing. > > v4: > - do not inline buffer_find_nonzero_offset() > - inline can_usebuffer_find_nonzero_offset() correctly > - readd asserts in buffer_find_nonzero_offset() as profiling > shows they do not hurt. > - change last occurences of scalar 8 by > BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR > - avoid deferencing p already in patch 5 where we > know that the page (p) is zero > - explicitly set bytes_sent = 0 if we skip a zero page. > bytes_sent was 0 before, but it was not obvious. > - add accounting information for skipped zero pages > - fix errors reported by checkpatch.pl > > v3: > - remove asserts, inline functions and add a check > function if buffer_find_nonzero_offset() can be used. > - use above check function in buffer_is_zero() and > find_next_bit(). > - use buffer_is_nonzero_offset() directly to find > zero pages. we know that all requirements are met > for memory pages. > - fix C89 violation in buffer_is_zero(). > - avoid derefencing p in ram_save_block() if we already > know the page is zero. > - fix initialization of last_offset in reset_ram_globals(). > - avoid skipping pages with offset == 0 in bulk stage in > migration_bitmap_find_and_reset_dirty(). > - compared to v1 check for zero pages also after bulk > ram migration as there are guests (e.g. Windows) which > zero out large amount of memory while running. > > v2: > - fix description, add trivial zero check and add asserts > to buffer_find_nonzero_offset. > - add a constant for the unroll factor of buffer_find_nonzero_offset > - replace is_dup_page() by buffer_is_zero() > - added test results to xbzrle patch > - optimize descriptions > > Peter Lieven (9): > move vector definitions to qemu-common.h > cutils: add a function to find non-zero content in a buffer > buffer_is_zero: use vector optimizations if possible > bitops: use vector algorithm to optimize find_next_bit() > migration: search for zero instead of dup pages > migration: add an indicator for bulk state of ram migration > migration: do not sent zero pages in bulk stage > migration: do not search dirty pages in bulk stage > migration: use XBZRLE only after bulk stage > > arch_init.c | 74 +++++++++++++++++++---------------------- > hmp.c | 2 ++ > include/migration/migration.h | 2 ++ > include/qemu-common.h | 37 +++++++++++++++++++++ > migration.c | 3 +- > qapi-schema.json | 6 ++-- > qmp-commands.hx | 3 +- > util/bitops.c | 24 +++++++++++-- > util/cutils.c | 50 ++++++++++++++++++++++++++++ > 9 files changed, 155 insertions(+), 46 deletions(-) > I think patch 4 is a bit overengineered. I would prefer the simple patch you had using three/four non-vectorized accesses. The setup cost of the vectorized buffer_is_zero is quite high, and 64 bits are just 256k RAM; if the host doesn't touch 256k RAM, it will incur the overhead. I would prefer some more benchmarking for patch 5, but it looks ok. The rest are fine, thanks! Paolo