From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38695) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zw5G9-0006lp-Ob for qemu-devel@nongnu.org; Tue, 10 Nov 2015 04:30:42 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Zw5G5-0006zW-IJ for qemu-devel@nongnu.org; Tue, 10 Nov 2015 04:30:37 -0500 Received: from mx1.redhat.com ([209.132.183.28]:39881) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Zw5G5-0006zG-38 for qemu-devel@nongnu.org; Tue, 10 Nov 2015 04:30:33 -0500 References: <1447123907-26750-1-git-send-email-liang.z.li@intel.com> <564167C4.2060702@redhat.com> <87h9ku8bev.fsf@emacs.mitica> From: Paolo Bonzini Message-ID: <5641B932.6020701@redhat.com> Date: Tue, 10 Nov 2015 10:30:26 +0100 MIME-Version: 1.0 In-Reply-To: <87h9ku8bev.fsf@emacs.mitica> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: quintela@redhat.com, "Li, Liang Z" Cc: "amit.shah@redhat.com" , "qemu-devel@nongnu.org" , "mst@redhat.com" On 10/11/2015 10:13, Juan Quintela wrote: >> > I rewrite the buffer_find_nonzero_offset() with the 'bool memeqzero4_paolo length' >> > then write a test program to check a large amount of zero pages, and >> > use the 'time' to >> > recode the time takes by different optimization. Test result is like this: >> > >> > SSE2: >> > ------------------------------------------------------ >> > | test 1 | test 2 >> > ---------------------------------------------------- >> > Time(S):| 13.696 | 13.533 >> > ------------------------------------------------ >> > >> > >> > AVX2: >> > ------------------------------------------- >> > | test 1 | test 2 >> > ------------------------------------------- >> > Time (S):| 10.583 | 10.306 >> > ------------------------------------------- >> > >> > memeqzero4_paolo: >> > --------------------------------------- >> > | test 1 | test 2 >> > --------------------------------------- >> > Time (S):| 9.718 | 9.817 >> > ---------------------------------------- >> > >> > >> > Paolo's implementation has the best performance. It seems that we can >> > remove the SSE2 related Intrinsics. Note that you can simplify my implementation a lot, because buffer_find_nonzero_offset already assumes that the buffer is aligned to sizeof(VECTYPE), i.e. 16 bytes. For example you can just check the first 4 unsigned longs against zero and then call memcmp. Paolo > How should I understand that comment? That you are about to send an > email to remove the sse2 support and that I can forget about this patch?