All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yury Norov <yury.norov@gmail.com>
To: Will Deacon <will@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-arch@vger.kernel.org, Alexey Klimov <aklimov@redhat.com>
Subject: Re: [PATCH] arm64: enable GENERIC_FIND_FIRST_BIT
Date: Mon, 7 Dec 2020 17:59:16 -0800	[thread overview]
Message-ID: <CAAH8bW-fb0wPwwvo8P8VW33zV=Wi_LPWxdJH8y2wdGGqPE+3nA@mail.gmail.com> (raw)
In-Reply-To: <20201207112530.GB4379@willie-the-truck>

(CC: Alexey Klimov)

On Mon, Dec 7, 2020 at 3:25 AM Will Deacon <will@kernel.org> wrote:
>
> On Sat, Dec 05, 2020 at 08:54:06AM -0800, Yury Norov wrote:
> > ARM64 doesn't implement find_first_{zero}_bit in arch code and doesn't
> > enable it in config. It leads to using find_next_bit() which is less
> > efficient:
>
> [...]
>
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index 1515f6f153a0..2b90ef1f548e 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -106,6 +106,7 @@ config ARM64
> >       select GENERIC_CPU_AUTOPROBE
> >       select GENERIC_CPU_VULNERABILITIES
> >       select GENERIC_EARLY_IOREMAP
> > +     select GENERIC_FIND_FIRST_BIT
>
> Does this actually make any measurable difference? The disassembly with
> or without this is _very_ similar for me (clang 11).
>
> Will

On A-53 find_first_bit() is almost twice faster than find_next_bit(),
according to
lib/find_bit_benchmark. (Thanks to Alexey for testing.)

Yury

---

Tested-by: Alexey Klimov <aklimov@redhat.com>

Start testing find_bit() with random-filled bitmap
[7126084.864616] find_next_bit:                 9653351 ns, 164280 iterations
[7126084.881146] find_next_zero_bit:            9591974 ns, 163401 iterations
[7126084.893859] find_last_bit:                 5778627 ns, 164280 iterations
[7126084.948181] find_first_bit:               47389224 ns,  16357 iterations
[7126084.958975] find_next_and_bit:             3875849 ns,  73487 iterations
[7126084.965884]
                 Start testing find_bit() with sparse bitmap
[7126084.973474] find_next_bit:                  109879 ns,    655 iterations
[7126084.999365] find_next_zero_bit:           18968440 ns, 327026 iterations
[7126085.006351] find_last_bit:                   80503 ns,    655 iterations
[7126085.032315] find_first_bit:               19048193 ns,    655 iterations
[7126085.039303] find_next_and_bit:               82628 ns,      1 iterations

with enabled GENERIC_FIND_FIRST_BIT:

               Start testing find_bit() with random-filled bitmap
[   84.095335] find_next_bit:                 9600970 ns, 163770 iterations
[   84.111695] find_next_zero_bit:            9613137 ns, 163911 iterations
[   84.124143] find_last_bit:                 5713907 ns, 163770 iterations
[   84.158068] find_first_bit:               27193319 ns,  16406 iterations
[   84.168663] find_next_and_bit:             3863814 ns,  73671 iterations
[   84.175392]
               Start testing find_bit() with sparse bitmap
[   84.182660] find_next_bit:                  112334 ns,    656 iterations
[   84.208375] find_next_zero_bit:           18976981 ns, 327025 iterations
[   84.215184] find_last_bit:                   79584 ns,    656 iterations
[   84.233005] find_first_bit:               11082437 ns,    656 iterations
[   84.239821] find_next_and_bit:               82209 ns,      1 iterations

root@pine:~# cpupower -c all frequency-info | grep asserted
  current CPU frequency: 648 MHz (asserted by call to hardware)
  current CPU frequency: 648 MHz (asserted by call to hardware)
  current CPU frequency: 648 MHz (asserted by call to hardware)
  current CPU frequency: 648 MHz (asserted by call to hardware)
root@pine:~# lscpu
Architecture:                    aarch64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
CPU(s):                          4
On-line CPU(s) list:             0-3
Thread(s) per core:              1
Core(s) per socket:              4
Socket(s):                       1
Vendor ID:                       ARM
Model:                           4
Model name:                      Cortex-A53
Stepping:                        r0p4
CPU max MHz:                     1152.0000
CPU min MHz:                     648.0000
BogoMIPS:                        48.00
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Not affected
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Not affected
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fp asimd evtstrm aes pmull sha1 sha2
crc32 cpuid

WARNING: multiple messages have this Message-ID (diff)
From: Yury Norov <yury.norov@gmail.com>
To: Will Deacon <will@kernel.org>
Cc: linux-arch@vger.kernel.org,
	Catalin Marinas <catalin.marinas@arm.com>,
	Alexey Klimov <aklimov@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH] arm64: enable GENERIC_FIND_FIRST_BIT
Date: Mon, 7 Dec 2020 17:59:16 -0800	[thread overview]
Message-ID: <CAAH8bW-fb0wPwwvo8P8VW33zV=Wi_LPWxdJH8y2wdGGqPE+3nA@mail.gmail.com> (raw)
In-Reply-To: <20201207112530.GB4379@willie-the-truck>

(CC: Alexey Klimov)

On Mon, Dec 7, 2020 at 3:25 AM Will Deacon <will@kernel.org> wrote:
>
> On Sat, Dec 05, 2020 at 08:54:06AM -0800, Yury Norov wrote:
> > ARM64 doesn't implement find_first_{zero}_bit in arch code and doesn't
> > enable it in config. It leads to using find_next_bit() which is less
> > efficient:
>
> [...]
>
> > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
> > index 1515f6f153a0..2b90ef1f548e 100644
> > --- a/arch/arm64/Kconfig
> > +++ b/arch/arm64/Kconfig
> > @@ -106,6 +106,7 @@ config ARM64
> >       select GENERIC_CPU_AUTOPROBE
> >       select GENERIC_CPU_VULNERABILITIES
> >       select GENERIC_EARLY_IOREMAP
> > +     select GENERIC_FIND_FIRST_BIT
>
> Does this actually make any measurable difference? The disassembly with
> or without this is _very_ similar for me (clang 11).
>
> Will

On A-53 find_first_bit() is almost twice faster than find_next_bit(),
according to
lib/find_bit_benchmark. (Thanks to Alexey for testing.)

Yury

---

Tested-by: Alexey Klimov <aklimov@redhat.com>

Start testing find_bit() with random-filled bitmap
[7126084.864616] find_next_bit:                 9653351 ns, 164280 iterations
[7126084.881146] find_next_zero_bit:            9591974 ns, 163401 iterations
[7126084.893859] find_last_bit:                 5778627 ns, 164280 iterations
[7126084.948181] find_first_bit:               47389224 ns,  16357 iterations
[7126084.958975] find_next_and_bit:             3875849 ns,  73487 iterations
[7126084.965884]
                 Start testing find_bit() with sparse bitmap
[7126084.973474] find_next_bit:                  109879 ns,    655 iterations
[7126084.999365] find_next_zero_bit:           18968440 ns, 327026 iterations
[7126085.006351] find_last_bit:                   80503 ns,    655 iterations
[7126085.032315] find_first_bit:               19048193 ns,    655 iterations
[7126085.039303] find_next_and_bit:               82628 ns,      1 iterations

with enabled GENERIC_FIND_FIRST_BIT:

               Start testing find_bit() with random-filled bitmap
[   84.095335] find_next_bit:                 9600970 ns, 163770 iterations
[   84.111695] find_next_zero_bit:            9613137 ns, 163911 iterations
[   84.124143] find_last_bit:                 5713907 ns, 163770 iterations
[   84.158068] find_first_bit:               27193319 ns,  16406 iterations
[   84.168663] find_next_and_bit:             3863814 ns,  73671 iterations
[   84.175392]
               Start testing find_bit() with sparse bitmap
[   84.182660] find_next_bit:                  112334 ns,    656 iterations
[   84.208375] find_next_zero_bit:           18976981 ns, 327025 iterations
[   84.215184] find_last_bit:                   79584 ns,    656 iterations
[   84.233005] find_first_bit:               11082437 ns,    656 iterations
[   84.239821] find_next_and_bit:               82209 ns,      1 iterations

root@pine:~# cpupower -c all frequency-info | grep asserted
  current CPU frequency: 648 MHz (asserted by call to hardware)
  current CPU frequency: 648 MHz (asserted by call to hardware)
  current CPU frequency: 648 MHz (asserted by call to hardware)
  current CPU frequency: 648 MHz (asserted by call to hardware)
root@pine:~# lscpu
Architecture:                    aarch64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
CPU(s):                          4
On-line CPU(s) list:             0-3
Thread(s) per core:              1
Core(s) per socket:              4
Socket(s):                       1
Vendor ID:                       ARM
Model:                           4
Model name:                      Cortex-A53
Stepping:                        r0p4
CPU max MHz:                     1152.0000
CPU min MHz:                     648.0000
BogoMIPS:                        48.00
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Not affected
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Not affected
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fp asimd evtstrm aes pmull sha1 sha2
crc32 cpuid

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-12-08  2:00 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-05 16:54 [PATCH] arm64: enable GENERIC_FIND_FIRST_BIT Yury Norov
2020-12-05 16:54 ` Yury Norov
2020-12-07 11:25 ` Will Deacon
2020-12-07 11:25   ` Will Deacon
2020-12-08  1:59   ` Yury Norov [this message]
2020-12-08  1:59     ` Yury Norov
2020-12-08 10:35     ` Will Deacon
2020-12-08 10:35       ` Will Deacon
2021-02-24  5:27       ` Yury Norov
2021-02-24  5:27         ` Yury Norov
2021-02-24  9:33         ` Will Deacon
2021-02-24  9:33           ` Will Deacon
2021-02-24 11:52 ` Alexander Lobakin
2021-02-24 11:52   ` Alexander Lobakin
2021-02-24 15:44   ` Yury Norov
2021-02-24 15:44     ` Yury Norov
2021-02-25 11:53     ` Alexander Lobakin
2021-02-25 11:53       ` Alexander Lobakin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAAH8bW-fb0wPwwvo8P8VW33zV=Wi_LPWxdJH8y2wdGGqPE+3nA@mail.gmail.com' \
    --to=yury.norov@gmail.com \
    --cc=aklimov@redhat.com \
    --cc=catalin.marinas@arm.com \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.