From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:38432) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UJIwz-0002ll-6k for qemu-devel@nongnu.org; Sat, 23 Mar 2013 03:33:15 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UJIww-0007ps-Q7 for qemu-devel@nongnu.org; Sat, 23 Mar 2013 03:33:13 -0400 Received: from ssl.dlhnet.de ([91.198.192.8]:53592 helo=ssl.dlh.net) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UJIww-0007pf-2Q for qemu-devel@nongnu.org; Sat, 23 Mar 2013 03:33:10 -0400 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) From: Peter Lieven In-Reply-To: <514CCBF6.1030903@redhat.com> Date: Sat, 23 Mar 2013 08:34:40 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <4F4E59F8-475D-445D-AC26-3ECB3B2D0DC6@kamp.de> References: <1363956370-23681-1-git-send-email-pl@kamp.de> <514C9413.8090902@redhat.com> <514CAEFD.7090904@kamp.de> <514CCBF6.1030903@redhat.com> Subject: Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini Cc: Orit Wasserman , quintela@redhat.com, qemu-devel@nongnu.org, Stefan Hajnoczi Am 22.03.2013 um 22:24 schrieb Paolo Bonzini : > Il 22/03/2013 20:20, Peter Lieven ha scritto: >>> I think patch 4 is a bit overengineered. I would prefer the simple >>> patch you had using three/four non-vectorized accesses. The setup = cost >>> of the vectorized buffer_is_zero is quite high, and 64 bits are just >>> 256k RAM; if the host doesn't touch 256k RAM, it will incur the = overhead. >> I think you are right. I was a little to eager to utilize = buffer_find_nonzero_offset() >> as much as possible. The performance gain by unrolling was impressive = enough. >> The gain by the vector functions is not that big that it would = justify a possible >> slow down by the high setup costs. My testings revealed that in most = cases buffer_find_nonzero_offset() >> returns 0 or a big offset. All the 0 return values would have = increased setup costs with >> the vectorized version of patch 4. >>=20 >>>=20 >>> I would prefer some more benchmarking for patch 5, but it looks ok. >> What would you like to see? Statistics how many pages of a real = system >> are not zero, but zero in the first sizeof(long) bytes? >=20 > Yeah, more or less. Running the system for a while, migrating, and > plotting a histogram of the return values of = buffer_find_nonzero_offset > (hmm, perhaps using a nonvectorized version is better for this = experiment). I will follow up with this on Monday. Have you seen my concern, that the = whole page is read anyway if it is non-zero? Peter >=20 > Paolo