From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:58026) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UI2Nt-0000Ab-3B for qemu-devel@nongnu.org; Tue, 19 Mar 2013 15:39:51 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UI2Nq-0008HX-1C for qemu-devel@nongnu.org; Tue, 19 Mar 2013 15:39:45 -0400 Received: from ssl.dlhnet.de ([91.198.192.8]:46610 helo=ssl.dlh.net) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UI2Np-0008Gx-Qn for qemu-devel@nongnu.org; Tue, 19 Mar 2013 15:39:41 -0400 Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\)) From: Peter Lieven In-Reply-To: <51489715.7050103@redhat.com> Date: Tue, 19 Mar 2013 20:40:50 +0100 Content-Transfer-Encoding: quoted-printable Message-Id: <082932DE-A201-41F9-A51F-141B6A13D39A@kamp.de> References: <1363362619-3190-1-git-send-email-pl@kamp.de> <1363362619-3190-5-git-send-email-pl@kamp.de> <51489715.7050103@redhat.com> Subject: Re: [Qemu-devel] [PATCHv2 4/9] bitops: use vector algorithm to optimize find_next_bit() List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Eric Blake Cc: qemu-devel@nongnu.org Am 19.03.2013 um 17:49 schrieb Eric Blake : > On 03/15/2013 09:50 AM, Peter Lieven wrote: >> this patch adds the usage of buffer_find_nonzero_offset() >> to skip large areas of zeroes. >>=20 >> compared to loop unrolling presented in an earlier >> patch this adds another 50% performance benefit for >> skipping large areas of zeroes. loop unrolling alone >> added close to 100% speedup. >>=20 >> Signed-off-by: Peter Lieven >> --- >> util/bitops.c | 26 +++++++++++++++++++++++--- >> 1 file changed, 23 insertions(+), 3 deletions(-) >=20 >> + while (size >=3D BITS_PER_LONG) { >> + if ((tmp =3D *p)) { >> + goto found_middle; >> + } >> + if (((uintptr_t) p) % sizeof(VECTYPE) =3D=3D 0=20 >> + && size >=3D BITS_PER_BYTE * sizeof(VECTYPE) >> + * BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR) { >=20 > Another instance where a helper function to check for alignment would = be > nice. Except this time you have a BITS_PER_BYTE factor, so you would = be > calling something like buffer_can_use_vectors(buf, size / = BITS_PER_BYTE) >=20 >> + unsigned long tmp2 =3D >> + buffer_find_nonzero_offset(p, ((size / = BITS_PER_BYTE) &=20 >> + = ~(BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR *=20 >> + sizeof(VECTYPE) - 1))); >=20 > Type mismatch - buffer_find_nonzero_offset returns size_t, which isn't > necessarily the same size as unsigned long. I'm not sure if it can = bite > you. I will look into it. >=20 >> + result +=3D tmp2 * BITS_PER_BYTE; >> + size -=3D tmp2 * BITS_PER_BYTE; >> + p +=3D tmp2 / sizeof(unsigned long); >> + if (!size) { >> + return result; >> + } >> + if (tmp2) { >=20 > Do you really need this condition, or would it suffice to just > 'continue;' the loop? Once buffer_find_nonzero_offset returns = anything > that leaves size as non-zero, we are guaranteed that the loop will = goto > found_middle without any further calls to buffer_find_nonzero_offset. Note in all cases. It will do if the nonzero content is in the first = sizeof(unsigned long) bytes. If not, buffer_find_nonzero_offset() is called again. It will = return 0 because in the first sizeof(VECTYPE)*BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR bytes is a non-zero byte. To avoid this I placed this check. Peter >=20 >> + if ((tmp =3D *p)) { >> + goto found_middle; >> + } >> + } >> } >> + p++; >> result +=3D BITS_PER_LONG; >> size -=3D BITS_PER_LONG; >> } >>=20 >=20 > --=20 > Eric Blake eblake redhat com +1-919-301-3266 > Libvirt virtualization library http://libvirt.org >=20