All of lore.kernel.org
 help / color / mirror / Atom feed
From: Yury Norov <yury.norov@gmail.com>
To: Petr Mladek <pmladek@suse.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-mmc@vger.kernel.org, linux-perf-users@vger.kernel.org,
	kvm@vger.kernel.org,
	"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
	Alexander Lobakin <alobakin@pm.me>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Alexey Klimov <aklimov@redhat.com>,
	Andrea Merello <andrea.merello@gmail.com>,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Arnd Bergmann <arnd@arndb.de>, Ben Gardon <bgardon@google.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Brian Cain <bcain@codeaurora.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Christoph Lameter <cl@linux.com>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	David Hildenbrand <david@redhat.com>,
	Dennis Zhou <dennis@kernel.org>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	Heiko Carstens <hca@linux.ibm.com>,
	Ian Rogers <irogers@google.com>, Ingo Molnar <mingo@redhat.com>,
	Jaegeuk Kim <jaegeuk@kernel.org>,
	Jakub Kicinski <kuba@kernel.org>, Jiri Olsa <jolsa@redhat.com>,
	Joe Perches <joe@perches.com>, Jonas Bonn <jonas@southpole.se>,
	Leo Yan <leo.yan@linaro.org>, Mark Rutland <mark.rutland@arm.com>,
	Namhyung Kim <namhyung@kernel.org>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Rich Felker <dalias@libc.org>,
	Samuel Mendoza-Jonas <sam@mendozajonas.com>,
	Sean Christopherson <seanjc@google.com>,
	Sergey Senozhatsky <senozhatsky@chromium.org>,
	Shuah Khan <shuah@kernel.org>,
	Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>,
	Steven Rostedt <rostedt@goodmis.org>, Tejun Heo <tj@kernel.org>,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	Ulf Hansson <ulf.hansson@linaro.org>,
	Will Deacon <will@kernel.org>,
	Wolfram Sang <wsa+renesas@sang-engineering.com>,
	Yoshinori Sato <ysato@users.sourceforge.jp>
Subject: Re: [PATCH 11/17] find: micro-optimize for_each_{set,clear}_bit()
Date: Mon, 30 Aug 2021 09:15:05 -0700	[thread overview]
Message-ID: <YS0ECS2Lb4rwqJ4b@yury-ThinkPad> (raw)
In-Reply-To: <20210830121249.2fgyvf47py2tz5s5@pathway.suse.cz>

On Mon, Aug 30, 2021 at 02:12:49PM +0200, Petr Mladek wrote:
> On Thu 2021-08-26 14:09:55, Yury Norov wrote:
> > On Thu, Aug 26, 2021 at 03:57:13PM +0200, Petr Mladek wrote:
> > > On Sat 2021-08-14 14:17:07, Yury Norov wrote:
> > > > The macros iterate thru all set/clear bits in a bitmap. They search a
> > > > first bit using find_first_bit(), and the rest bits using find_next_bit().
> > > > 
> > > > Since find_next_bit() is called shortly after find_first_bit(), we can
> > > > save few lines of I-cache by not using find_first_bit().
> > > 
> > > Is this only a speculation or does it fix a real performance problem?
> > > 
> > > The macro is used like:
> > > 
> > > 	for_each_set_bit(bit, addr, size) {
> > > 		fn(bit);
> > > 	}
> > > 
> > > IMHO, the micro-opimization does not help when fn() is non-trivial.
> >  
> > The effect is measurable:
> > 
> > Start testing for_each_bit()
> > for_each_set_bit:                15296 ns,   1000 iterations
> > for_each_set_bit_from:           15225 ns,   1000 iterations
> > 
> > Start testing for_each_bit() with cash flushing
> > for_each_set_bit:               547626 ns,   1000 iterations
> > for_each_set_bit_from:          497899 ns,   1000 iterations
> > 
> > Refer this:
> > 
> > https://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg356151.html
> 
> I see. The results look convincing on the first look.
> 
> But I am still not sure. This patch is basically contradicting many
> other patches from this patchset:
> 
>   + 5th patch optimizes find_first_and_bit() and proves that it is
>     much faster:
> 
>     Before (#define find_first_and_bit(...) find_next_and_bit(..., 0):
>     Start testing find_bit() with random-filled bitmap
>     [  140.291468] find_first_and_bit:           46890919 ns,  32671 iterations
>     Start testing find_bit() with sparse bitmap
>     [  140.295028] find_first_and_bit:               7103 ns,      1 iterations
> 
>     After:
>     Start testing find_bit() with random-filled bitmap
>     [  162.574907] find_first_and_bit:           25045813 ns,  32846 iterations
>     Start testing find_bit() with sparse bitmap
>     [  162.578458] find_first_and_bit:               4900 ns,      1 iterations
> 
>        => saves 46% in random bitmap
> 	  saves 31% in sparse bitmap
> 
> 
>   + 6th, 7th, and 9th patch makes the code use find_first_bit()
>     because it is faster than find_next_bit(mask, size, 0);
> 
>   + Now, 11th (this) patch replaces find_first_bit() with
>     find_next_bit(mask, size, 0) because find_first_bit()
>     makes things slower. It is suspicious at minimum.
> 
> 
> By other words. The I-cache could safe 10% in one case.
> But find_first_bit() might safe 46% in random case.

Those are different cases. find_first_bit() is approximately twice
faster than find_next_bit, and much smaller. The conclusion is simple:
use 'first' version whenever possible if there's no other considerations.

In case of for_each_bit() macros, however, we have such a consideration.
In contrast to regular pattern, where user calls either first, or next
versions N times, here we call find_first_bit once, and then find_next_bit
N-1 times.

Because we know for sure that we'll call find_next_bit shortly, we can
benefit from locality under heavy pressure on I-cache, if replace 'first'
with 'next'. Consider it as a prefetch mechanism for the following calls
to find_next_bit().

> Does I-cache cost more than the faster code?
 
In this case cache miss is more expensive.

> Or was for_each_set_bit() tested only with a bitmap
> where find_first_bit() optimization did not help much?

I tried to ensure that the effect of I-cache is real and in this case
more important than code performance, so in the test I called 'first'
once and 'next' twice.

> How would for_each_set_bit() work with random bitmap?

It would work for all bitmaps.

> How does it work with larger bitmaps?

Percentage gain (but not absolute) will decrease proportionally to the
number of calls of find_next_bit() for big N.

Thanks,
Yury

> Best Regards,
> Petr

  reply	other threads:[~2021-08-30 16:15 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-14 21:16 [PATCH RESEND 00/17] Resend bitmap patches Yury Norov
2021-08-14 21:16 ` [PATCH 01/17] bitops: protect find_first_{,zero}_bit properly Yury Norov
2021-08-14 21:16 ` [PATCH 02/17] bitops: move find_bit_*_le functions from le.h to find.h Yury Norov
2021-08-14 21:16 ` [PATCH 03/17] include: move find.h from asm_generic to linux Yury Norov
2021-08-14 21:17 ` [PATCH 04/17] arch: remove GENERIC_FIND_FIRST_BIT entirely Yury Norov
2021-08-14 21:17 ` [PATCH 05/17] lib: add find_first_and_bit() Yury Norov
2021-08-14 21:17 ` [PATCH 06/17] cpumask: use find_first_and_bit() Yury Norov
2021-08-14 21:17 ` [PATCH 07/17] all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate Yury Norov
2021-08-14 21:17 ` [PATCH 08/17] tools: sync tools/bitmap with mother linux Yury Norov
2021-08-14 21:17 ` [PATCH 09/17] cpumask: replace cpumask_next_* with cpumask_first_* where appropriate Yury Norov
2021-08-14 21:17 ` [PATCH 10/17] include/linux: move for_each_bit() macros from bitops.h to find.h Yury Norov
2021-08-14 21:17 ` [PATCH 11/17] find: micro-optimize for_each_{set,clear}_bit() Yury Norov
2021-08-26 13:57   ` Petr Mladek
2021-08-26 21:09     ` Yury Norov
2021-08-30 12:12       ` Petr Mladek
2021-08-30 16:15         ` Yury Norov [this message]
2021-08-14 21:17 ` [PATCH 12/17] Replace for_each_*_bit_from() with for_each_*_bit() where appropriate Yury Norov
2021-08-14 21:17 ` [PATCH 13/17] tools: Rename bitmap_alloc() to bitmap_zalloc() Yury Norov
2021-08-14 21:17 ` [PATCH 14/17] mm/percpu: micro-optimize pcpu_is_populated() Yury Norov
2021-08-14 21:17 ` [PATCH 15/17] bitmap: unify find_bit operations Yury Norov
2021-08-14 21:17 ` [PATCH 16/17] lib: bitmap: add performance test for bitmap_print_to_pagebuf Yury Norov
2021-08-14 21:17 ` [PATCH 17/17] vsprintf: rework bitmap_list_string Yury Norov
2021-08-15 11:09   ` Andy Shevchenko
2021-08-15 11:09     ` Andy Shevchenko
2021-08-17 16:35     ` Yury Norov
2021-08-17 16:35       ` Yury Norov
2021-08-26 14:15   ` Petr Mladek
2021-08-26 20:59     ` Yury Norov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YS0ECS2Lb4rwqJ4b@yury-ThinkPad \
    --to=yury.norov@gmail.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=acme@redhat.com \
    --cc=aklimov@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=alobakin@pm.me \
    --cc=andrea.merello@gmail.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=arnd@arndb.de \
    --cc=bcain@codeaurora.org \
    --cc=benh@kernel.crashing.org \
    --cc=bgardon@google.com \
    --cc=bristot@redhat.com \
    --cc=catalin.marinas@arm.com \
    --cc=cl@linux.com \
    --cc=dalias@libc.org \
    --cc=david@redhat.com \
    --cc=dennis@kernel.org \
    --cc=geert@linux-m68k.org \
    --cc=hca@linux.ibm.com \
    --cc=irogers@google.com \
    --cc=jaegeuk@kernel.org \
    --cc=joe@perches.com \
    --cc=jolsa@redhat.com \
    --cc=jonas@southpole.se \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=leo.yan@linaro.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-mmc@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=palmer@dabbelt.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rostedt@goodmis.org \
    --cc=sam@mendozajonas.com \
    --cc=seanjc@google.com \
    --cc=senozhatsky@chromium.org \
    --cc=shuah@kernel.org \
    --cc=stefan.kristiansson@saunalahti.fi \
    --cc=tj@kernel.org \
    --cc=tsbogend@alpha.franken.de \
    --cc=ulf.hansson@linaro.org \
    --cc=will@kernel.org \
    --cc=wsa+renesas@sang-engineering.com \
    --cc=ysato@users.sourceforge.jp \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.