[PATCH v3 0/3] lib: find_*_bit reimplementation

* [PATCH v3 0/3] lib: find_*_bit reimplementation
@ 2015-02-08 14:10 Yury Norov
  2015-02-08 14:10 ` [PATCH v3 1/3] " Yury Norov
                   ` (4 more replies)
  0 siblings, 5 replies; 27+ messages in thread
From: Yury Norov @ 2015-02-08 14:10 UTC (permalink / raw)
  To: klimov.linux, davem, akpm, hannes, dborkman, laijs,
	takahiro.akashi, valentinrothberg, linux, msalter, chris, tgraf
  Cc: Yury Norov, linux-kernel

This patchset does rework find_bit functions family to achieve better
performance, and decrease size of text. All rework is done in patch 1.
Patches 2 and 3 are about code moving and renaming.

It was boot-tested on x86_64 and MIPS (big-endian) machines.
Performance tests were ran on userspace with code like this:

	/* addr[] is filled from /dev/urandom */
	start = clock();
	while (ret < nbits)
		ret = find_next_bit(addr, nbits, ret + 1);

	end = clock();
	printf("%ld\t", (unsigned long) end - start);

On Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz rezults are next:
(for find_next_bit, nbits is 8M, for find_first_bit - 80K)

	find_next_bit:		find_first_bit:
	new	current		new	current
	26932	43151		14777	14925
	26947	43182		14521	15423
	26507	43824		15053	14705
	27329	43759		14473	14777
	26895	43367		14847	15023
	26990	43693		15103	15163
	26775	43299		15067	15232
	27282	42752		14544	15121
	27504	43088		14644	14858
	26761	43856		14699	15193
	26692	43075		14781	14681
	27137	42969		14451	15061
	...			...

find_next_bit performance gain is 35-40%;
find_first_bit - no measurable difference.

On ARM machine, there is arch-specific implementation for find_bit.
To disable it, and use generic one, please apply next patch:

	---
	 arch/arm/include/asm/bitops.h   | 19 -------------------
	 arch/arm/kernel/armksyms.c      | 11 -----------
	 arch/arm/lib/Makefile           |  2 +-
	 include/asm-generic/bitops/le.h |  1 +
	 4 files changed, 2 insertions(+), 31 deletions(-)

	diff --git a/arch/arm/include/asm/bitops.h b/arch/arm/include/asm/bitops.h
	index 5638099..e0611d1 100644
	--- a/arch/arm/include/asm/bitops.h
	+++ b/arch/arm/include/asm/bitops.h
	@@ -192,25 +192,6 @@ extern int _find_next_bit_be(const unsigned long *p, int size, int offset);
	 #define test_and_clear_bit(nr,p)	ATOMIC_BITOP(test_and_clear_bit,nr,p)
	 #define test_and_change_bit(nr,p)	ATOMIC_BITOP(test_and_change_bit,nr,p)

	-#ifndef __ARMEB__
	-/*
	- * These are the little endian, atomic definitions.
	- */
	-#define find_first_zero_bit(p,sz)	_find_first_zero_bit_le(p,sz)
	-#define find_next_zero_bit(p,sz,off)	_find_next_zero_bit_le(p,sz,off)
	-#define find_first_bit(p,sz)		_find_first_bit_le(p,sz)
	-#define find_next_bit(p,sz,off)		_find_next_bit_le(p,sz,off)
	-
	-#else
	-/*
	- * These are the big endian, atomic definitions.
	- */
	-#define find_first_zero_bit(p,sz)	_find_first_zero_bit_be(p,sz)
	-#define find_next_zero_bit(p,sz,off)	_find_next_zero_bit_be(p,sz,off)
	-#define find_first_bit(p,sz)		_find_first_bit_be(p,sz)
	-#define find_next_bit(p,sz,off)		_find_next_bit_be(p,sz,off)
	-
	-#endif

	 #if __LINUX_ARM_ARCH__ < 5

	diff --git a/arch/arm/kernel/armksyms.c b/arch/arm/kernel/armksyms.c
	index a88671c..22e8748 100644
	--- a/arch/arm/kernel/armksyms.c
	+++ b/arch/arm/kernel/armksyms.c
	@@ -146,17 +146,6 @@ EXPORT_SYMBOL(_clear_bit);
	 EXPORT_SYMBOL(_test_and_clear_bit);
	 EXPORT_SYMBOL(_change_bit);
	 EXPORT_SYMBOL(_test_and_change_bit);
	-EXPORT_SYMBOL(_find_first_zero_bit_le);
	-EXPORT_SYMBOL(_find_next_zero_bit_le);
	-EXPORT_SYMBOL(_find_first_bit_le);
	-EXPORT_SYMBOL(_find_next_bit_le);
	-
	-#ifdef __ARMEB__
	-EXPORT_SYMBOL(_find_first_zero_bit_be);
	-EXPORT_SYMBOL(_find_next_zero_bit_be);
	-EXPORT_SYMBOL(_find_first_bit_be);
	-EXPORT_SYMBOL(_find_next_bit_be);
	-#endif

	 #ifdef CONFIG_FUNCTION_TRACER
	 #ifdef CONFIG_OLD_MCOUNT
	diff --git a/arch/arm/lib/Makefile b/arch/arm/lib/Makefile
	index 0573faa..de369aa 100644
	--- a/arch/arm/lib/Makefile
	+++ b/arch/arm/lib/Makefile
	@@ -6,7 +6,7 @@

	 lib-y		:= backtrace.o changebit.o csumipv6.o csumpartial.o   \
			   csumpartialcopy.o csumpartialcopyuser.o clearbit.o \
	-		   delay.o delay-loop.o findbit.o memchr.o memcpy.o   \
	+		   delay.o delay-loop.o memchr.o memcpy.o	      \
			   memmove.o memset.o memzero.o setbit.o              \
			   strchr.o strrchr.o                                 \
			   testchangebit.o testclearbit.o testsetbit.o        \
	diff --git a/include/asm-generic/bitops/le.h b/include/asm-generic/bitops/le.h
	index 6173154..9a8798f 100644
	--- a/include/asm-generic/bitops/le.h
	+++ b/include/asm-generic/bitops/le.h
	@@ -2,6 +2,7 @@
	 #define _ASM_GENERIC_BITOPS_LE_H_

	 #include <asm/types.h>
	+#include <asm-generic/bitops/find.h>
	 #include <asm/byteorder.h>

	 #if defined(__LITTLE_ENDIAN)
	-- 
	1.9.1

v3:
* conditional bit inverting in _find_next_bit replaced with XORing by
  mask (patch 1);
* size/offset checkers moved from wrappers to _find_next_bit (patch 1);
* #include <linux/bitops.h> restored (patch 1);
* lib/find_next_bit descriptor changed to not appeal to file name and
  so let to produce clean diff in case of rename (patch 2);
* patch 3 is generated with '-M' option to detect renaming and drop
  file content;
* this cover added.

Yury Norov (3):
  lib: find_*_bit reimplementation
  lib: move find_last_bit to lib/find_next_bit.c
  lib: rename lib/find_next_bit.c to lib/find_bit.c

 lib/Makefile        |   2 +-
 lib/find_bit.c      | 197 ++++++++++++++++++++++++++++++++++++
 lib/find_last_bit.c |  49 ---------
 lib/find_next_bit.c | 285 ----------------------------------------------------
 4 files changed, 198 insertions(+), 335 deletions(-)
 create mode 100644 lib/find_bit.c
 delete mode 100644 lib/find_last_bit.c
 delete mode 100644 lib/find_next_bit.c

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Tested-by: Alexey Klimov <klimov.linux@gmail.com> (odroid-xu3 ARMv7 SoC).
-- 
2.1.0

^ permalink raw reply	[flat|nested] 27+ messages in thread