From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:52424) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UK8Tg-0006Mn-7b for qemu-devel@nongnu.org; Mon, 25 Mar 2013 10:34:26 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UK8Te-0006uj-Mu for qemu-devel@nongnu.org; Mon, 25 Mar 2013 10:34:24 -0400 Received: from mail-ve0-f170.google.com ([209.85.128.170]:47213) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UK8Te-0006uV-J7 for qemu-devel@nongnu.org; Mon, 25 Mar 2013 10:34:22 -0400 Received: by mail-ve0-f170.google.com with SMTP id 14so5011013vea.15 for ; Mon, 25 Mar 2013 07:34:22 -0700 (PDT) Sender: Paolo Bonzini Message-ID: <51506068.5080103@redhat.com> Date: Mon, 25 Mar 2013 15:34:16 +0100 From: Paolo Bonzini MIME-Version: 1.0 References: <972929461.13095041.1364216522903.JavaMail.root@redhat.com> <4E89AD05-F328-493A-9C31-E52A033420B1@kamp.de> <806A8BFB-FF1F-482C-B679-2B1B10D06D7C@kamp.de> In-Reply-To: <806A8BFB-FF1F-482C-B679-2B1B10D06D7C@kamp.de> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [PATCHv4 0/9] buffer_is_zero / migration optimizations List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Peter Lieven Cc: Stefan Hajnoczi , Orit Wasserman , qemu-devel@nongnu.org, quintela@redhat.com Il 25/03/2013 14:32, Peter Lieven ha scritto: > > Am 25.03.2013 um 14:23 schrieb Peter Lieven : > >> >> Am 25.03.2013 um 14:02 schrieb Paolo Bonzini : >> >>>> Maybe I should have explained the output more detailed. The percentages >>>> are added. 35.8% in the second last column means that >>>> 35.8% have a return value that is less than TARGET_PAGE_SIZE. >>>> This was meant to illustrate at how many 64-bit chunks you have >>>> to look to grab a certain percentage of non-zero pages. >>> >>> Ok, I wrongly understood that many pages had 4088 zero bytes but >>> the last 8 were not zero. Now it's clearer, and more logical too. :) >>> >>>> Looking e.g. at the third value it means that looking at the first >>>> three 64-bit chunks it will catch 34.0% of all pages. >>>> It turns out that the non-zeroness of a page can be detected looking >>>> at the first 256 or so bits and only a low >>>> percentage turns out to be non-zero at a later position. So after >>>> having checked the first chunks one by one >>>> there is no big penalty looking at the remaining chunks with the >>>> vectorized loop. >>> >>> I think it makes most sense to unroll the first four non-vectorized >>> iterations, i.e. not use SSE and use three or four ifs. Either: >>> >>> if (foo[0]) return 0; >>> if (foo[1]) return 8; >>> if (foo[2]) return 16; >>> if (foo[3]) return 24; >>> >>> or >>> >>> if (foo[0]) return 0; >>> if (foo[1] | foo[2] | foo[3]) return 8; >>> >>> and then proceed on the remaining 4096-4*sizeof(long) bytes with >>> the vectorized loop. foo+4 is aligned for SIMD operations on both >>> 32- and 64-bit machines, which makes this a nice choice. >> >> i can't start at foo+4 since the remaining X-4*sizeof(long) bytes >> are not dividable by 8*sizeof(VECTYPE). Hmm, right. What about just processing the first few longs twice, i.e. the above followed by "for (i = 0; i < len / sizeof(sizeof(VECTYPE); i += BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR)"? Paolo >> >> for (i = BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR; >> i < len / sizeof(VECTYPE); >> i += BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR) { >> … >> } > > performance of the above is bad compared to: > > for (i = 0; i < BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR; i++) { > if (!ALL_EQ(p[i], zero)) { > return i * sizeof(VECTYPE); > } > } > > … > > The above is basically what old is_dup_page is doing, but after the first > 8 iterations the optimized version kicks in. > > Peter > > >