From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46213) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bjqZT-0001SA-Iy for qemu-devel@nongnu.org; Tue, 13 Sep 2016 12:28:32 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bjqZP-00006R-Jj for qemu-devel@nongnu.org; Tue, 13 Sep 2016 12:28:31 -0400 Received: from mail-yb0-f178.google.com ([209.85.213.178]:35948) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bjqZP-0007ra-Fn for qemu-devel@nongnu.org; Tue, 13 Sep 2016 12:28:27 -0400 Received: by mail-yb0-f178.google.com with SMTP id u125so63342456ybg.3 for ; Tue, 13 Sep 2016 09:28:06 -0700 (PDT) Sender: Richard Henderson References: <1473783005-113609-1-git-send-email-pbonzini@redhat.com> <1473783005-113609-11-git-send-email-pbonzini@redhat.com> From: Richard Henderson Message-ID: Date: Tue, 13 Sep 2016 09:27:02 -0700 MIME-Version: 1.0 In-Reply-To: <1473783005-113609-11-git-send-email-pbonzini@redhat.com> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 10/10] cutils: Rewrite x86 buffer zero checking List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini , qemu-devel@nongnu.org On 09/13/2016 09:10 AM, Paolo Bonzini wrote: > @@ -177,16 +231,15 @@ bool test_buffer_is_zero_next_accel(void) > > static bool select_accel_fn(const void *buf, size_t len) > { > - uintptr_t ibuf = (uintptr_t)buf; > #ifdef CONFIG_AVX2_OPT > - if (len % 128 == 0 && ibuf % 32 == 0 && (cpuid_cache & CACHE_AVX2)) { > + if (len >= 128 && (cpuid_cache & CACHE_AVX2)) { > return buffer_zero_avx2(buf, len); > } > - if (len % 64 == 0 && ibuf % 16 == 0 && (cpuid_cache & CACHE_SSE4)) { > + if (len >= 64 && (cpuid_cache & CACHE_SSE4)) { > return buffer_zero_sse4(buf, len); > } > #endif > - if (len % 64 == 0 && ibuf % 16 == 0 && (cpuid_cache & CACHE_SSE2)) { > + if (len >= 64 && (cpuid_cache & CACHE_SSE2)) { > return buffer_zero_sse2(buf, len); > } You've dropped a major change to select_accel_fn here. (1) The avx2 routine, as written, can support len >= 64, therefore a common test works for all of the vectorized functions. (2) I had saved the pointer to the routine, so that we didn't have to repeatedly test multiple cpuid_cache bits. r~