From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:38695)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1Zw5G9-0006lp-Ob
	for qemu-devel@nongnu.org; Tue, 10 Nov 2015 04:30:42 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1Zw5G5-0006zW-IJ
	for qemu-devel@nongnu.org; Tue, 10 Nov 2015 04:30:37 -0500
Received: from mx1.redhat.com ([209.132.183.28]:39881)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <pbonzini@redhat.com>) id 1Zw5G5-0006zG-38
	for qemu-devel@nongnu.org; Tue, 10 Nov 2015 04:30:33 -0500
References: <1447123907-26750-1-git-send-email-liang.z.li@intel.com>
	<564167C4.2060702@redhat.com>
	<F2CBF3009FA73547804AE4C663CAB28E019A2935@shsmsx102.ccr.corp.intel.com>
	<87h9ku8bev.fsf@emacs.mitica>
From: Paolo Bonzini <pbonzini@redhat.com>
Message-ID: <5641B932.6020701@redhat.com>
Date: Tue, 10 Nov 2015 10:30:26 +0100
MIME-Version: 1.0
In-Reply-To: <87h9ku8bev.fsf@emacs.mitica>
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [v2 0/2] add avx2 instruction optimization
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: quintela@redhat.com, "Li, Liang Z" <liang.z.li@intel.com>
Cc: "amit.shah@redhat.com" <amit.shah@redhat.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, "mst@redhat.com" <mst@redhat.com>


On 10/11/2015 10:13, Juan Quintela wrote:
>> > I rewrite the buffer_find_nonzero_offset() with the 'bool memeqzero4_paolo length'
>> > then write a test program to check a large amount of zero pages, and
>> > use the 'time' to
>> > recode the time takes by different optimization. Test result is like this:
>> >
>> > SSE2:
>> > ------------------------------------------------------
>> >               |            test 1         |     test 2
>> > ----------------------------------------------------
>> > Time(S):|       13.696            | 13.533  
>> > ------------------------------------------------
>> >
>> >
>> > AVX2:
>> > -------------------------------------------
>> >               |        test 1     | test 2
>> > -------------------------------------------
>> > Time (S):|      10.583      |  10.306
>> > -------------------------------------------
>> >
>> > memeqzero4_paolo:
>> > ---------------------------------------
>> >               |        test 1     | test 2
>> > ---------------------------------------
>> > Time (S):|      9.718     |  9.817
>> > ----------------------------------------
>> >
>> >
>> > Paolo's implementation has the best performance. It seems that we can
>> > remove the SSE2 related Intrinsics.

Note that you can simplify my implementation a lot, because
buffer_find_nonzero_offset already assumes that the buffer is aligned to
sizeof(VECTYPE), i.e. 16 bytes.  For example you can just check the
first 4 unsigned longs against zero and then call memcmp.

Paolo

> How should I understand that comment?  That you are about to send an
> email to remove the sse2 support and that I can forget about this patch?