All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yury Norov <yury.norov@gmail.com>
To: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>,
	"linux-m68k@lists.linux-m68k.org"
	<linux-m68k@lists.linux-m68k.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-sh@vger.kernel.org" <linux-sh@vger.kernel.org>,
	"linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	Yoshinori Sato <ysato@users.sourceforge.jp>,
	Rich Felker <dalias@libc.org>, Arnd Bergmann <arnd@arndb.de>,
	Dennis Zhou <dennis@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Wolfram Sang <wsa+renesas@sang-engineering.com>,
	David Sterba <dsterba@suse.com>,
	Stefano Brivio <sbrivio@redhat.com>,
	"Ma, Jianpeng" <jianpeng.ma@intel.com>,
	Wei Yang <richard.weiyang@linux.alibaba.com>,
	Josh Poimboeuf <jpoimboe@redhat.com>,
	John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>,
	Joe Perches <joe@perches.com>, Alexey Klimov <aklimov@redhat.com>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>
Subject: Re: [PATCH 7/8] lib: add fast path for find_next_*_bit()
Date: Mon, 1 Feb 2021 23:02:00 -0800	[thread overview]
Message-ID: <CAAH8bW-Cm9MSkicAwbY3qYzLaYi6jbqx96zdb_pVi6L8dFtnWQ@mail.gmail.com> (raw)
In-Reply-To: <YBgqwwrfWqU8wx+s@smile.fi.intel.com>

[ CC Alexey Klimov, andRasmus Villemoes as they also may be interested ]

On Mon, Feb 1, 2021 at 8:22 AM Andy Shevchenko
<andriy.shevchenko@linux.intel.com> wrote:

> On Mon, Feb 01, 2021 at 04:02:30PM +0000, David Laight wrote:
> > From: Andy Shevchenko
> > > Sent: 01 February 2021 13:49
> > > On Sat, Jan 30, 2021 at 11:17:18AM -0800, Yury Norov wrote:
> > > > Similarly to bitmap functions, find_next_*_bit() users will benefit
> > > > if we'll handle a case of bitmaps that fit into a single word. In the
> > > > very best case, the compiler may replace a function call with a
> > > > single ffs or ffz instruction.
> > >
> > > Would be nice to have the examples how it reduces the actual code size (based
> > > on the existing code in kernel, especially in widely used frameworks /
> > > subsystems, like PCI).
>>
> > I bet it makes the kernel bigger but very slightly faster.
> > But the fact that the wrappers end up in the i-cache may
> > mean that inlining actually makes it slower for some calling
> > sequences.
>
> > If a bitmap fits in a single word (as a compile-time constant)
> > then you should (probably) be using different functions if
> > you care about performance.
>
> Isn't this patch series exactly about it?

Probably, David meant something like find_first_bit_fast_path().
Not sure that this approach would be better than pure inlining.

Addressing questions above.

My arm64 build for qemu:
Before:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .head.text    00010000  ffff800010000000  ffff800010000000  00010000  2**16
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .text         011603a8  ffff800010010000  ffff800010010000  00020000  2**16
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  2 .got.plt      00000018  ffff8000111703a8  ffff8000111703a8  011803a8  2**3
                  CONTENTS, ALLOC, LOAD, DATA
  3 .rodata       007a29b2  ffff800011180000  ffff800011180000  01190000  2**12
                  CONTENTS, ALLOC, LOAD, DATA
  4 .pci_fixup    000025f0  ffff8000119229c0  ffff8000119229c0  019329c0  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 __ksymtab     000100d4  ffff800011924fb0  ffff800011924fb0  01934fb0  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  6 __ksymtab_gpl 000147cc  ffff800011935084  ffff800011935084  01945084  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  7 __ksymtab_strings 0003b0be  ffff800011949850  ffff800011949850
01959850  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  8 __param       00003e58  ffff800011984910  ffff800011984910  01994910  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  9 __modver      00000678  ffff800011988768  ffff800011988768  01998768  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 10 __ex_table    00002270  ffff800011988de0  ffff800011988de0  01998de0  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 11 .notes        0000003c  ffff80001198b050  ffff80001198b050  0199b050  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 12 .init.text    0007b09c  ffff8000119a0000  ffff8000119a0000  019a0000  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 13 .exit.text    00004120  ffff800011a1b09c  ffff800011a1b09c  01a1b09c  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
...

After:
Idx Name          Size      VMA               LMA               File off  Algn
  0 .head.text    00010000  ffff800010000000  ffff800010000000  00010000  2**16
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .text         011613a8  ffff800010010000  ffff800010010000  00020000  2**16
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  2 .got.plt      00000018  ffff8000111713a8  ffff8000111713a8  011813a8  2**3
                  CONTENTS, ALLOC, LOAD, DATA
  3 .rodata       007a26b2  ffff800011180000  ffff800011180000  01190000  2**12
                  CONTENTS, ALLOC, LOAD, DATA
  4 .pci_fixup    000025f0  ffff8000119226c0  ffff8000119226c0  019326c0  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 __ksymtab     000100bc  ffff800011924cb0  ffff800011924cb0  01934cb0  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  6 __ksymtab_gpl 000147cc  ffff800011934d6c  ffff800011934d6c  01944d6c  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  7 __ksymtab_strings 0003b09b  ffff800011949538  ffff800011949538
01959538  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  8 __param       00003e58  ffff8000119845d8  ffff8000119845d8  019945d8  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  9 __modver      00000678  ffff800011988430  ffff800011988430  01998430  2**3
                  CONTENTS, ALLOC, LOAD, DATA
 10 __ex_table    00002270  ffff800011988aa8  ffff800011988aa8  01998aa8  2**3
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 11 .notes        0000003c  ffff80001198ad18  ffff80001198ad18  0199ad18  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
 12 .init.text    0007b1a8  ffff8000119a0000  ffff8000119a0000  019a0000  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
 13 .exit.text    00004120  ffff800011a1b1a8  ffff800011a1b1a8  01a1b1a8  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE

The size of  .text is increased by 0x1000 bytes, or 0.02%
The size of .rodata is decreased by 0x300 bytes, or 0.06%
The size of  __ksymtab is decreased by 24 bytes, or 0.018%

So the cost of this fast path is ~3.3K.

Regarding code generation, this is the quite typical find_bit() user:

unsigned int cpumask_next(int n, const struct cpumask *srcp)
{
        /* -1 is a legal arg here. */
        if (n != -1)
                cpumask_check(n);
        return find_next_bit(cpumask_bits(srcp), nr_cpumask_bits, n + 1);
}
EXPORT_SYMBOL(cpumask_next);

Before:
0000000000000000 <cpumask_next>:
   0:   a9bf7bfd        stp     x29, x30, [sp, #-16]!
   4:   11000402        add     w2, w0, #0x1
   8:   aa0103e0        mov     x0, x1
   c:   d2800401        mov     x1, #0x40                       // #64
  10:   910003fd        mov     x29, sp
  14:   93407c42        sxtw    x2, w2
  18:   94000000        bl      0 <find_next_bit>
  1c:   a8c17bfd        ldp     x29, x30, [sp], #16
  20:   d65f03c0        ret
  24:   d503201f        nop

After:
0000000000000140 <cpumask_next>:
 140:   11000400        add     w0, w0, #0x1
 144:   93407c00        sxtw    x0, w0
 148:   f100fc1f        cmp     x0, #0x3f
 14c:   54000168        b.hi    178 <cpumask_next+0x38>  // b.pmore
 150:   f9400023        ldr     x3, [x1]
 154:   92800001        mov     x1, #0xffffffffffffffff         // #-1
 158:   9ac02020        lsl     x0, x1, x0
 15c:   52800802        mov     w2, #0x40                       // #64
 160:   8a030001        and     x1, x0, x3
 164:   dac00020        rbit    x0, x1
 168:   f100003f        cmp     x1, #0x0
 16c:   dac01000        clz     x0, x0
 170:   1a800040        csel    w0, w2, w0, eq  // eq = none
 174:   d65f03c0        ret
 178:   52800800        mov     w0, #0x40                       // #64
 17c:   d65f03c0        ret

find_next_bit() call is removed with the cost of 6 instructions.
(And I suspect we can improve the GENMASK() for better code generation.)
find_next_bit() itself is 41 instructions, so I believe this fast path
would bring
value for many users.

On the other hand, if someone is limited with I-cache and wants to avoid
generating fast paths, I think it's worth to make SMALL_CONST() configurable,
which would also disable fast paths in lib/bitmap.c. We may rely on -Os flag, or
enable fast path in config explicitly:

diff --git a/include/asm-generic/bitsperlong.h
b/include/asm-generic/bitsperlong.h
index 0eeb77544f1d..209e531074c1 100644
--- a/include/asm-generic/bitsperlong.h
+++ b/include/asm-generic/bitsperlong.h
@@ -23,6 +23,10 @@
 #define BITS_PER_LONG_LONG 64
 #endif

+#ifdef CONFIG_FAST_PATH
 #define SMALL_CONST(n) (__builtin_constant_p(n) && (unsigned long)(n)
< BITS_PER_LONG)
+#else
+#define SMALL_CONST(n) (0)
+#endif

 #endif /* __ASM_GENERIC_BITS_PER_LONG */
diff --git a/lib/Kconfig b/lib/Kconfig
index a38cc61256f1..c9629b3ebce8 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -39,6 +39,14 @@ config PACKING

          When in doubt, say N.

+config FAST_PATH
+       bool "Enable fast path code generation"
+       default y
+       help
+         This option enables fast path optimization with the cost
+         of increasing the text section.
+
 config BITREVERSE
        tristate

> With Best Regards,
> Andy Shevchenko

  reply	other threads:[~2021-02-02  7:03 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-30 19:17 [RESEND PATCH v2 0/6] lib/find_bit: fast path for small bitmaps Yury Norov
2021-01-30 19:17 ` [PATCH 1/8] tools: disable -Wno-type-limits Yury Norov
2021-01-30 19:17 ` [PATCH 2/8] tools: bitmap: sync function declarations with linux kernel Yury Norov
2021-01-30 19:17 ` [PATCH 3/8] arch: rearrange headers inclusion order in asm/bitops for m68k and sh Yury Norov
2021-01-30 19:17 ` [PATCH 4/8] lib: introduce BITS_{FIRST,LAST} macro Yury Norov
2021-02-01 13:42   ` Andy Shevchenko
2021-01-30 19:17 ` [PATCH 5/8] bitsperlong.h: introduce SMALL_CONST() macro Yury Norov
2021-02-01 13:45   ` Andy Shevchenko
2021-02-02  7:10     ` Yury Norov
2021-01-30 19:17 ` [PATCH 6/8] lib: inline _find_next_bit() wrappers Yury Norov
2021-02-01 13:47   ` Andy Shevchenko
2021-02-02  7:13     ` Yury Norov
2021-01-30 19:17 ` [PATCH 7/8] lib: add fast path for find_next_*_bit() Yury Norov
2021-02-01 13:49   ` Andy Shevchenko
2021-02-01 16:02     ` David Laight
2021-02-01 16:22       ` 'Andy Shevchenko'
2021-02-02  7:02         ` Yury Norov [this message]
2021-01-30 19:17 ` [PATCH 8/8] lib: add fast path for find_first_*_bit() and find_last_bit() Yury Norov
2021-02-01 13:50   ` Andy Shevchenko
2021-02-15 21:30 ` [RESEND PATCH v2 0/6] lib/find_bit: fast path for small bitmaps Yury Norov
2021-02-16  9:14   ` Andy Shevchenko
2021-02-16 18:00     ` Yury Norov
2021-02-17 10:33       ` Andy Shevchenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAAH8bW-Cm9MSkicAwbY3qYzLaYi6jbqx96zdb_pVi6L8dFtnWQ@mail.gmail.com \
    --to=yury.norov@gmail.com \
    --cc=David.Laight@aculab.com \
    --cc=aklimov@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=arnd@arndb.de \
    --cc=dalias@libc.org \
    --cc=dennis@kernel.org \
    --cc=dsterba@suse.com \
    --cc=geert@linux-m68k.org \
    --cc=glaubitz@physik.fu-berlin.de \
    --cc=jianpeng.ma@intel.com \
    --cc=joe@perches.com \
    --cc=jpoimboe@redhat.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-m68k@lists.linux-m68k.org \
    --cc=linux-sh@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=richard.weiyang@linux.alibaba.com \
    --cc=sbrivio@redhat.com \
    --cc=wsa+renesas@sang-engineering.com \
    --cc=ysato@users.sourceforge.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.