From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([208.118.235.92]:58026)
	by lists.gnu.org with esmtp (Exim 4.71) (envelope-from <pl@kamp.de>)
	id 1UI2Nt-0000Ab-3B
	for qemu-devel@nongnu.org; Tue, 19 Mar 2013 15:39:51 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <pl@kamp.de>) id 1UI2Nq-0008HX-1C
	for qemu-devel@nongnu.org; Tue, 19 Mar 2013 15:39:45 -0400
Received: from ssl.dlhnet.de ([91.198.192.8]:46610 helo=ssl.dlh.net)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <pl@kamp.de>)
	id 1UI2Np-0008Gx-Qn
	for qemu-devel@nongnu.org; Tue, 19 Mar 2013 15:39:41 -0400
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 6.2 \(1499\))
From: Peter Lieven <pl@kamp.de>
In-Reply-To: <51489715.7050103@redhat.com>
Date: Tue, 19 Mar 2013 20:40:50 +0100
Content-Transfer-Encoding: quoted-printable
Message-Id: <082932DE-A201-41F9-A51F-141B6A13D39A@kamp.de>
References: <1363362619-3190-1-git-send-email-pl@kamp.de>
	<1363362619-3190-5-git-send-email-pl@kamp.de>
	<51489715.7050103@redhat.com>
Subject: Re: [Qemu-devel] [PATCHv2 4/9] bitops: use vector algorithm to
	optimize find_next_bit()
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Eric Blake <eblake@redhat.com>
Cc: qemu-devel@nongnu.org


Am 19.03.2013 um 17:49 schrieb Eric Blake <eblake@redhat.com>:

> On 03/15/2013 09:50 AM, Peter Lieven wrote:
>> this patch adds the usage of buffer_find_nonzero_offset()
>> to skip large areas of zeroes.
>>=20
>> compared to loop unrolling presented in an earlier
>> patch this adds another 50% performance benefit for
>> skipping large areas of zeroes. loop unrolling alone
>> added close to 100% speedup.
>>=20
>> Signed-off-by: Peter Lieven <pl@kamp.de>
>> ---
>> util/bitops.c |   26 +++++++++++++++++++++++---
>> 1 file changed, 23 insertions(+), 3 deletions(-)
>=20
>> +    while (size >=3D BITS_PER_LONG) {
>> +        if ((tmp =3D *p)) {
>> +             goto found_middle;
>> +        }
>> +        if (((uintptr_t) p) % sizeof(VECTYPE) =3D=3D 0=20
>> +                && size >=3D BITS_PER_BYTE * sizeof(VECTYPE)
>> +                   * BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR) {
>=20
> Another instance where a helper function to check for alignment would =
be
> nice.  Except this time you have a BITS_PER_BYTE factor, so you would =
be
> calling something like buffer_can_use_vectors(buf, size / =
BITS_PER_BYTE)
>=20
>> +            unsigned long tmp2 =3D
>> +                buffer_find_nonzero_offset(p, ((size / =
BITS_PER_BYTE) &=20
>> +                           =
~(BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR *=20
>> +                             sizeof(VECTYPE) - 1)));
>=20
> Type mismatch - buffer_find_nonzero_offset returns size_t, which isn't
> necessarily the same size as unsigned long.  I'm not sure if it can =
bite
> you.

I will look into it.

>=20
>> +            result +=3D tmp2 * BITS_PER_BYTE;
>> +            size -=3D tmp2 * BITS_PER_BYTE;
>> +            p +=3D tmp2 / sizeof(unsigned long);
>> +            if (!size) {
>> +                return result;
>> +            }
>> +            if (tmp2) {
>=20
> Do you really need this condition, or would it suffice to just
> 'continue;' the loop?  Once buffer_find_nonzero_offset returns =
anything
> that leaves size as non-zero, we are guaranteed that the loop will =
goto
> found_middle without any further calls to buffer_find_nonzero_offset.

Note in all cases. It will do if the nonzero content is in the first =
sizeof(unsigned long)
bytes. If not, buffer_find_nonzero_offset() is called again. It will =
return 0 because
in the first sizeof(VECTYPE)*BUFFER_FIND_NONZERO_OFFSET_UNROLL_FACTOR
bytes is a non-zero byte. To avoid this I placed this check.

Peter


>=20
>> +                if ((tmp =3D *p)) {
>> +                    goto found_middle;
>> +                }
>> +            }
>>         }
>> +        p++;
>>         result +=3D BITS_PER_LONG;
>>         size -=3D BITS_PER_LONG;
>>     }
>>=20
>=20
> --=20
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
>=20