From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47758) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bjqek-0005oH-Fj for qemu-devel@nongnu.org; Tue, 13 Sep 2016 12:33:59 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bjqef-0007zS-VC for qemu-devel@nongnu.org; Tue, 13 Sep 2016 12:33:58 -0400 Received: from mx1.redhat.com ([209.132.183.28]:58124) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bjqef-0007yb-QU for qemu-devel@nongnu.org; Tue, 13 Sep 2016 12:33:53 -0400 References: <1473783005-113609-1-git-send-email-pbonzini@redhat.com> <1473783005-113609-11-git-send-email-pbonzini@redhat.com> From: Paolo Bonzini Message-ID: <2734bd7f-48d4-a20c-ab8c-16b56bb370ed@redhat.com> Date: Tue, 13 Sep 2016 18:33:50 +0200 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 10/10] cutils: Rewrite x86 buffer zero checking List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Richard Henderson , qemu-devel@nongnu.org On 13/09/2016 18:27, Richard Henderson wrote: > On 09/13/2016 09:10 AM, Paolo Bonzini wrote: >> @@ -177,16 +231,15 @@ bool test_buffer_is_zero_next_accel(void) >> >> static bool select_accel_fn(const void *buf, size_t len) >> { >> - uintptr_t ibuf = (uintptr_t)buf; >> #ifdef CONFIG_AVX2_OPT >> - if (len % 128 == 0 && ibuf % 32 == 0 && (cpuid_cache & CACHE_AVX2)) { >> + if (len >= 128 && (cpuid_cache & CACHE_AVX2)) { >> return buffer_zero_avx2(buf, len); >> } >> - if (len % 64 == 0 && ibuf % 16 == 0 && (cpuid_cache & CACHE_SSE4)) { >> + if (len >= 64 && (cpuid_cache & CACHE_SSE4)) { >> return buffer_zero_sse4(buf, len); >> } >> #endif >> - if (len % 64 == 0 && ibuf % 16 == 0 && (cpuid_cache & CACHE_SSE2)) { >> + if (len >= 64 && (cpuid_cache & CACHE_SSE2)) { >> return buffer_zero_sse2(buf, len); >> } > > You've dropped a major change to select_accel_fn here. > > (1) The avx2 routine, as written, can support len >= 64, therefore a common > test works for all of the vectorized functions. > > (2) I had saved the pointer to the routine, so that we didn't have to > repeatedly test multiple cpuid_cache bits. Can you send a replacement for this patch only? Thanks, Paolo