All of lore.kernel.org
 help / color / mirror / Atom feed
From: Petr Mladek <pmladek@suse.com>
To: Yury Norov <yury.norov@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	linux-arch@vger.kernel.org, linux-kselftest@vger.kernel.org,
	linux-mmc@vger.kernel.org, linux-perf-users@vger.kernel.org,
	kvm@vger.kernel.org,
	"James E.J. Bottomley" <James.Bottomley@hansenpartnership.com>,
	Alexander Lobakin <alobakin@pm.me>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Alexey Klimov <aklimov@redhat.com>,
	Andrea Merello <andrea.merello@gmail.com>,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>,
	Arnd Bergmann <arnd@arndb.de>, Ben Gardon <bgardon@google.com>,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Brian Cain <bcain@codeaurora.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Christoph Lameter <cl@linux.com>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	David Hildenbrand <david@redhat.com>,
	Dennis Zhou <dennis@kernel.org>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	Heiko Carstens <hca@linux.ibm.com>,
	Ian Rogers <irogers@google.com>, Ingo Molnar <mingo@redhat.com>,
	Jaegeuk Kim <jaegeuk@kernel.org>,
	Jakub Kicinski <kuba@kernel.org>, Jiri Olsa <jolsa@redhat.com>,
	Joe Perches <joe@perches.com>, Jonas Bonn <jonas@southpole.se>,
	Leo Yan <leo.yan@linaro.org>, Mark Rutland <mark.rutland@arm.com>,
	Namhyung Kim <namhyung@kernel.org>,
	Palmer Dabbelt <palmer@dabbelt.com>,
	Paolo Bonzini <pbonzini@redhat.com>, Peter Xu <peterx@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Rasmus Villemoes <linux@rasmusvillemoes.dk>,
	Rich Felker <dalias@libc.org>,
	Samuel Mendoza-Jonas <sam@mendozajonas.com>,
	Sean Christopherson <seanjc@google.com>,
	Sergey Senozhatsky <senozhatsky@chromium.org>,
	Shuah Khan <shuah@kernel.org>,
	Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>,
	Steven Rostedt <rostedt@goodmis.org>, Tejun Heo <tj@kernel.org>,
	Thomas Bogendoerfer <tsbogend@alpha.franken.de>,
	Ulf Hansson <ulf.hansson@linaro.org>,
	Will Deacon <will@kernel.org>,
	Wolfram Sang <wsa+renesas@sang-engineering.com>,
	Yoshinori Sato <ysato@users.sourceforge.jp>
Subject: Re: [PATCH 11/17] find: micro-optimize for_each_{set,clear}_bit()
Date: Mon, 30 Aug 2021 14:12:49 +0200	[thread overview]
Message-ID: <20210830121249.2fgyvf47py2tz5s5@pathway.suse.cz> (raw)
In-Reply-To: <YSgDI9NpC51GhB/2@yury-ThinkPad>

On Thu 2021-08-26 14:09:55, Yury Norov wrote:
> On Thu, Aug 26, 2021 at 03:57:13PM +0200, Petr Mladek wrote:
> > On Sat 2021-08-14 14:17:07, Yury Norov wrote:
> > > The macros iterate thru all set/clear bits in a bitmap. They search a
> > > first bit using find_first_bit(), and the rest bits using find_next_bit().
> > > 
> > > Since find_next_bit() is called shortly after find_first_bit(), we can
> > > save few lines of I-cache by not using find_first_bit().
> > 
> > Is this only a speculation or does it fix a real performance problem?
> > 
> > The macro is used like:
> > 
> > 	for_each_set_bit(bit, addr, size) {
> > 		fn(bit);
> > 	}
> > 
> > IMHO, the micro-opimization does not help when fn() is non-trivial.
>  
> The effect is measurable:
> 
> Start testing for_each_bit()
> for_each_set_bit:                15296 ns,   1000 iterations
> for_each_set_bit_from:           15225 ns,   1000 iterations
> 
> Start testing for_each_bit() with cash flushing
> for_each_set_bit:               547626 ns,   1000 iterations
> for_each_set_bit_from:          497899 ns,   1000 iterations
> 
> Refer this:
> 
> https://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg356151.html

I see. The results look convincing on the first look.

But I am still not sure. This patch is basically contradicting many
other patches from this patchset:

  + 5th patch optimizes find_first_and_bit() and proves that it is
    much faster:

    Before (#define find_first_and_bit(...) find_next_and_bit(..., 0):
    Start testing find_bit() with random-filled bitmap
    [  140.291468] find_first_and_bit:           46890919 ns,  32671 iterations
    Start testing find_bit() with sparse bitmap
    [  140.295028] find_first_and_bit:               7103 ns,      1 iterations

    After:
    Start testing find_bit() with random-filled bitmap
    [  162.574907] find_first_and_bit:           25045813 ns,  32846 iterations
    Start testing find_bit() with sparse bitmap
    [  162.578458] find_first_and_bit:               4900 ns,      1 iterations

       => saves 46% in random bitmap
	  saves 31% in sparse bitmap


  + 6th, 7th, and 9th patch makes the code use find_first_bit()
    because it is faster than find_next_bit(mask, size, 0);

  + Now, 11th (this) patch replaces find_first_bit() with
    find_next_bit(mask, size, 0) because find_first_bit()
    makes things slower. It is suspicious at minimum.


By other words. The I-cache could safe 10% in one case.
But find_first_bit() might safe 46% in random case.

Does I-cache cost more than the faster code?

Or was for_each_set_bit() tested only with a bitmap
where find_first_bit() optimization did not help much?

How would for_each_set_bit() work with random bitmap?
How does it work with larger bitmaps?

Best Regards,
Petr

  reply	other threads:[~2021-08-30 12:13 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-08-14 21:16 [PATCH RESEND 00/17] Resend bitmap patches Yury Norov
2021-08-14 21:16 ` [PATCH 01/17] bitops: protect find_first_{,zero}_bit properly Yury Norov
2021-08-14 21:16 ` [PATCH 02/17] bitops: move find_bit_*_le functions from le.h to find.h Yury Norov
2021-08-14 21:16 ` [PATCH 03/17] include: move find.h from asm_generic to linux Yury Norov
2021-08-14 21:17 ` [PATCH 04/17] arch: remove GENERIC_FIND_FIRST_BIT entirely Yury Norov
2021-08-14 21:17 ` [PATCH 05/17] lib: add find_first_and_bit() Yury Norov
2021-08-14 21:17 ` [PATCH 06/17] cpumask: use find_first_and_bit() Yury Norov
2021-08-14 21:17 ` [PATCH 07/17] all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate Yury Norov
2021-08-14 21:17 ` [PATCH 08/17] tools: sync tools/bitmap with mother linux Yury Norov
2021-08-14 21:17 ` [PATCH 09/17] cpumask: replace cpumask_next_* with cpumask_first_* where appropriate Yury Norov
2021-08-14 21:17 ` [PATCH 10/17] include/linux: move for_each_bit() macros from bitops.h to find.h Yury Norov
2021-08-14 21:17 ` [PATCH 11/17] find: micro-optimize for_each_{set,clear}_bit() Yury Norov
2021-08-26 13:57   ` Petr Mladek
2021-08-26 21:09     ` Yury Norov
2021-08-30 12:12       ` Petr Mladek [this message]
2021-08-30 16:15         ` Yury Norov
2021-08-14 21:17 ` [PATCH 12/17] Replace for_each_*_bit_from() with for_each_*_bit() where appropriate Yury Norov
2021-08-14 21:17 ` [PATCH 13/17] tools: Rename bitmap_alloc() to bitmap_zalloc() Yury Norov
2021-08-14 21:17 ` [PATCH 14/17] mm/percpu: micro-optimize pcpu_is_populated() Yury Norov
2021-08-14 21:17 ` [PATCH 15/17] bitmap: unify find_bit operations Yury Norov
2021-08-14 21:17 ` [PATCH 16/17] lib: bitmap: add performance test for bitmap_print_to_pagebuf Yury Norov
2021-08-14 21:17 ` [PATCH 17/17] vsprintf: rework bitmap_list_string Yury Norov
2021-08-15 11:09   ` Andy Shevchenko
2021-08-15 11:09     ` Andy Shevchenko
2021-08-17 16:35     ` Yury Norov
2021-08-17 16:35       ` Yury Norov
2021-08-26 14:15   ` Petr Mladek
2021-08-26 20:59     ` Yury Norov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210830121249.2fgyvf47py2tz5s5@pathway.suse.cz \
    --to=pmladek@suse.com \
    --cc=James.Bottomley@hansenpartnership.com \
    --cc=acme@redhat.com \
    --cc=aklimov@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=alobakin@pm.me \
    --cc=andrea.merello@gmail.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=arnd@arndb.de \
    --cc=bcain@codeaurora.org \
    --cc=benh@kernel.crashing.org \
    --cc=bgardon@google.com \
    --cc=bristot@redhat.com \
    --cc=catalin.marinas@arm.com \
    --cc=cl@linux.com \
    --cc=dalias@libc.org \
    --cc=david@redhat.com \
    --cc=dennis@kernel.org \
    --cc=geert@linux-m68k.org \
    --cc=hca@linux.ibm.com \
    --cc=irogers@google.com \
    --cc=jaegeuk@kernel.org \
    --cc=joe@perches.com \
    --cc=jolsa@redhat.com \
    --cc=jonas@southpole.se \
    --cc=kuba@kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=leo.yan@linaro.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-mmc@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=linux@rasmusvillemoes.dk \
    --cc=mark.rutland@arm.com \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=palmer@dabbelt.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sam@mendozajonas.com \
    --cc=seanjc@google.com \
    --cc=senozhatsky@chromium.org \
    --cc=shuah@kernel.org \
    --cc=stefan.kristiansson@saunalahti.fi \
    --cc=tj@kernel.org \
    --cc=tsbogend@alpha.franken.de \
    --cc=ulf.hansson@linaro.org \
    --cc=will@kernel.org \
    --cc=wsa+renesas@sang-engineering.com \
    --cc=ysato@users.sourceforge.jp \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.