From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:32820) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UJ9Rj-0008GI-3p for qemu-devel@nongnu.org; Fri, 22 Mar 2013 17:24:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UJ9Rf-0001vk-3V for qemu-devel@nongnu.org; Fri, 22 Mar 2013 17:24:19 -0400 Received: from mail-wi0-x22b.google.com ([2a00:1450:400c:c05::22b]:59548) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UJ9Re-0001vb-So for qemu-devel@nongnu.org; Fri, 22 Mar 2013 17:24:15 -0400 Received: by mail-wi0-f171.google.com with SMTP id hn17so8342930wib.10 for ; Fri, 22 Mar 2013 14:24:14 -0700 (PDT) Sender: Paolo Bonzini Message-ID: <514CCBF6.1030903@redhat.com> Date: Fri, 22 Mar 2013 22:24:06 +0100 From: Paolo Bonzini MIME-Version: 1.0 References: <1363956370-23681-1-git-send-email-pl@kamp.de> <514C9413.8090902@redhat.com> <514CAEFD.7090904@kamp.de> In-Reply-To: <514CAEFD.7090904@kamp.de> Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Lieven Cc: Orit Wasserman , quintela@redhat.com, qemu-devel@nongnu.org, Stefan Hajnoczi Il 22/03/2013 20:20, Peter Lieven ha scritto: >> I think patch 4 is a bit overengineered. I would prefer the simple >> patch you had using three/four non-vectorized accesses. The setup cost >> of the vectorized buffer_is_zero is quite high, and 64 bits are just >> 256k RAM; if the host doesn't touch 256k RAM, it will incur the overhead. > I think you are right. I was a little to eager to utilize buffer_find_nonzero_offset() > as much as possible. The performance gain by unrolling was impressive enough. > The gain by the vector functions is not that big that it would justify a possible > slow down by the high setup costs. My testings revealed that in most cases buffer_find_nonzero_offset() > returns 0 or a big offset. All the 0 return values would have increased setup costs with > the vectorized version of patch 4. > >> >> I would prefer some more benchmarking for patch 5, but it looks ok. > What would you like to see? Statistics how many pages of a real system > are not zero, but zero in the first sizeof(long) bytes? Yeah, more or less. Running the system for a while, migrating, and plotting a histogram of the return values of buffer_find_nonzero_offset (hmm, perhaps using a nonvectorized version is better for this experiment). Paolo