All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alexander Lobakin <aleksander.lobakin@intel.com>
To: "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>
Cc: Alexander Lobakin <aleksander.lobakin@intel.com>,
	Yury Norov <yury.norov@gmail.com>,
	Alexander Potapenko <glider@google.com>,
	nex.sw.ncis.osdt.itp.upstreaming@intel.com,
	intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
	Przemek Kitszel <przemyslaw.kitszel@intel.com>
Subject: [PATCH net-next v6 06/21] bitops: let the compiler optimize {__,}assign_bit()
Date: Wed, 27 Mar 2024 16:23:43 +0100	[thread overview]
Message-ID: <20240327152358.2368467-7-aleksander.lobakin@intel.com> (raw)
In-Reply-To: <20240327152358.2368467-1-aleksander.lobakin@intel.com>

Since commit b03fc1173c0c ("bitops: let optimize out non-atomic bitops
on compile-time constants"), the compilers are able to expand inline
bitmap operations to compile-time initializers when possible.
However, during the round of replacement if-__set-else-__clear with
__assign_bit() as per Andy's advice, bloat-o-meter showed +1024 bytes
difference in object code size for one module (even one function),
where the pattern:

	DECLARE_BITMAP(foo) = { }; // on the stack, zeroed

	if (a)
		__set_bit(const_bit_num, foo);
	if (b)
		__set_bit(another_const_bit_num, foo);
	...

is heavily used, although there should be no difference: the bitmap is
zeroed, so the second half of __assign_bit() should be compiled-out as
a no-op.
I either missed the fact that __assign_bit() has bitmap pointer marked
as `volatile` (as we usually do for bitops) or was hoping that the
compilers would at least try to look past the `volatile` for
__always_inline functions. Anyhow, due to that attribute, the compilers
were always compiling the whole expression and no mentioned compile-time
optimizations were working.

Convert __assign_bit() to a macro since it's a very simple if-else and
all of the checks are performed inside __set_bit() and __clear_bit(),
thus that wrapper has to be as transparent as possible. After that
change, despite it showing only -20 bytes change for vmlinux (due to
that it's still relatively unpopular), no drastic code size changes
happen when replacing if-set-else-clear for onstack bitmaps with
__assign_bit(), meaning the compiler now expands them to the actual
operations will all the expected optimizations.

Atomic assign_bit() is less affected due to its nature, but let's
convert it to a macro as well to keep the code consistent and not
leave a place for possible suboptimal codegen. Moreover, with certain
kernel configuration it actually gives some saves (x86):

do_ip_setsockopt    4154    4099     -55

Suggested-by: Yury Norov <yury.norov@gmail.com> # assign_bit(), too
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Acked-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 include/linux/bitops.h | 20 ++++----------------
 1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index e0cd09eb91cd..b25dc8742124 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -275,23 +275,11 @@ static inline unsigned long fns(unsigned long word, unsigned int n)
  * @addr: the address to start counting from
  * @value: the value to assign
  */
-static __always_inline void assign_bit(long nr, volatile unsigned long *addr,
-				       bool value)
-{
-	if (value)
-		set_bit(nr, addr);
-	else
-		clear_bit(nr, addr);
-}
+#define assign_bit(nr, addr, value)					\
+	((value) ? set_bit((nr), (addr)) : clear_bit((nr), (addr)))
 
-static __always_inline void __assign_bit(long nr, volatile unsigned long *addr,
-					 bool value)
-{
-	if (value)
-		__set_bit(nr, addr);
-	else
-		__clear_bit(nr, addr);
-}
+#define __assign_bit(nr, addr, value)					\
+	((value) ? __set_bit((nr), (addr)) : __clear_bit((nr), (addr)))
 
 /**
  * __ptr_set_bit - Set bit in a pointer's value
-- 
2.44.0


WARNING: multiple messages have this Message-ID (diff)
From: Alexander Lobakin <aleksander.lobakin@intel.com>
To: "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>
Cc: Yury Norov <yury.norov@gmail.com>,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	Alexander Lobakin <aleksander.lobakin@intel.com>,
	intel-wired-lan@lists.osuosl.org,
	nex.sw.ncis.osdt.itp.upstreaming@intel.com,
	Alexander Potapenko <glider@google.com>,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
	Przemek Kitszel <przemyslaw.kitszel@intel.com>
Subject: [Intel-wired-lan] [PATCH net-next v6 06/21] bitops: let the compiler optimize {__, }assign_bit()
Date: Wed, 27 Mar 2024 16:23:43 +0100	[thread overview]
Message-ID: <20240327152358.2368467-7-aleksander.lobakin@intel.com> (raw)
In-Reply-To: <20240327152358.2368467-1-aleksander.lobakin@intel.com>

Since commit b03fc1173c0c ("bitops: let optimize out non-atomic bitops
on compile-time constants"), the compilers are able to expand inline
bitmap operations to compile-time initializers when possible.
However, during the round of replacement if-__set-else-__clear with
__assign_bit() as per Andy's advice, bloat-o-meter showed +1024 bytes
difference in object code size for one module (even one function),
where the pattern:

	DECLARE_BITMAP(foo) = { }; // on the stack, zeroed

	if (a)
		__set_bit(const_bit_num, foo);
	if (b)
		__set_bit(another_const_bit_num, foo);
	...

is heavily used, although there should be no difference: the bitmap is
zeroed, so the second half of __assign_bit() should be compiled-out as
a no-op.
I either missed the fact that __assign_bit() has bitmap pointer marked
as `volatile` (as we usually do for bitops) or was hoping that the
compilers would at least try to look past the `volatile` for
__always_inline functions. Anyhow, due to that attribute, the compilers
were always compiling the whole expression and no mentioned compile-time
optimizations were working.

Convert __assign_bit() to a macro since it's a very simple if-else and
all of the checks are performed inside __set_bit() and __clear_bit(),
thus that wrapper has to be as transparent as possible. After that
change, despite it showing only -20 bytes change for vmlinux (due to
that it's still relatively unpopular), no drastic code size changes
happen when replacing if-set-else-clear for onstack bitmaps with
__assign_bit(), meaning the compiler now expands them to the actual
operations will all the expected optimizations.

Atomic assign_bit() is less affected due to its nature, but let's
convert it to a macro as well to keep the code consistent and not
leave a place for possible suboptimal codegen. Moreover, with certain
kernel configuration it actually gives some saves (x86):

do_ip_setsockopt    4154    4099     -55

Suggested-by: Yury Norov <yury.norov@gmail.com> # assign_bit(), too
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
Acked-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 include/linux/bitops.h | 20 ++++----------------
 1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/include/linux/bitops.h b/include/linux/bitops.h
index e0cd09eb91cd..b25dc8742124 100644
--- a/include/linux/bitops.h
+++ b/include/linux/bitops.h
@@ -275,23 +275,11 @@ static inline unsigned long fns(unsigned long word, unsigned int n)
  * @addr: the address to start counting from
  * @value: the value to assign
  */
-static __always_inline void assign_bit(long nr, volatile unsigned long *addr,
-				       bool value)
-{
-	if (value)
-		set_bit(nr, addr);
-	else
-		clear_bit(nr, addr);
-}
+#define assign_bit(nr, addr, value)					\
+	((value) ? set_bit((nr), (addr)) : clear_bit((nr), (addr)))
 
-static __always_inline void __assign_bit(long nr, volatile unsigned long *addr,
-					 bool value)
-{
-	if (value)
-		__set_bit(nr, addr);
-	else
-		__clear_bit(nr, addr);
-}
+#define __assign_bit(nr, addr, value)					\
+	((value) ? __set_bit((nr), (addr)) : __clear_bit((nr), (addr)))
 
 /**
  * __ptr_set_bit - Set bit in a pointer's value
-- 
2.44.0


  parent reply	other threads:[~2024-03-27 15:24 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-27 15:23 [PATCH net-next v6 00/21] ice: add PFCP filter support Alexander Lobakin
2024-03-27 15:23 ` [Intel-wired-lan] " Alexander Lobakin
2024-03-27 15:23 ` [PATCH net-next v6 01/21] lib/bitmap: add bitmap_{read,write}() Alexander Lobakin
2024-03-27 15:23   ` [Intel-wired-lan] [PATCH net-next v6 01/21] lib/bitmap: add bitmap_{read, write}() Alexander Lobakin
2024-03-27 15:23 ` [PATCH net-next v6 02/21] lib/test_bitmap: add tests for bitmap_{read,write}() Alexander Lobakin
2024-03-27 15:23   ` [Intel-wired-lan] [PATCH net-next v6 02/21] lib/test_bitmap: add tests for bitmap_{read, write}() Alexander Lobakin
2024-03-27 15:47   ` [PATCH net-next v6 02/21] lib/test_bitmap: add tests for bitmap_{read,write}() Andy Shevchenko
2024-03-27 15:47     ` [Intel-wired-lan] [PATCH net-next v6 02/21] lib/test_bitmap: add tests for bitmap_{read, write}() Andy Shevchenko
2024-03-27 16:49     ` [PATCH net-next v6 02/21] lib/test_bitmap: add tests for bitmap_{read,write}() Alexander Lobakin
2024-03-27 16:49       ` [Intel-wired-lan] [PATCH net-next v6 02/21] lib/test_bitmap: add tests for bitmap_{read, write}() Alexander Lobakin
2024-03-27 15:23 ` [PATCH net-next v6 03/21] lib/test_bitmap: use pr_info() for non-error messages Alexander Lobakin
2024-03-27 15:23   ` [Intel-wired-lan] " Alexander Lobakin
2024-03-27 15:23 ` [PATCH net-next v6 04/21] bitops: add missing prototype check Alexander Lobakin
2024-03-27 15:23   ` [Intel-wired-lan] " Alexander Lobakin
2024-03-27 15:23 ` [PATCH net-next v6 05/21] bitops: make BYTES_TO_BITS() treewide-available Alexander Lobakin
2024-03-27 15:23   ` [Intel-wired-lan] " Alexander Lobakin
2024-03-27 15:23 ` Alexander Lobakin [this message]
2024-03-27 15:23   ` [Intel-wired-lan] [PATCH net-next v6 06/21] bitops: let the compiler optimize {__, }assign_bit() Alexander Lobakin
2024-03-27 15:23 ` [PATCH net-next v6 07/21] linkmode: convert linkmode_{test,set,clear,mod}_bit() to macros Alexander Lobakin
2024-03-27 15:23   ` [Intel-wired-lan] [PATCH net-next v6 07/21] linkmode: convert linkmode_{test, set, clear, mod}_bit() " Alexander Lobakin
2024-03-27 15:23 ` [PATCH net-next v6 08/21] s390/cio: rename bitmap_size() -> idset_bitmap_size() Alexander Lobakin
2024-03-27 15:23   ` [Intel-wired-lan] " Alexander Lobakin
2024-03-27 15:23 ` [PATCH net-next v6 09/21] fs/ntfs3: add prefix to bitmap_size() and use BITS_TO_U64() Alexander Lobakin
2024-03-27 15:23   ` [Intel-wired-lan] " Alexander Lobakin
2024-03-27 15:23 ` [PATCH net-next v6 10/21] btrfs: rename bitmap_set_bits() -> btrfs_bitmap_set_bits() Alexander Lobakin
2024-03-27 15:23   ` [Intel-wired-lan] " Alexander Lobakin
2024-03-27 15:23 ` [PATCH net-next v6 11/21] tools: move alignment-related macros to new <linux/align.h> Alexander Lobakin
2024-03-27 15:23   ` [Intel-wired-lan] " Alexander Lobakin
2024-03-27 15:23 ` [PATCH net-next v6 12/21] bitmap: introduce generic optimized bitmap_size() Alexander Lobakin
2024-03-27 15:23   ` [Intel-wired-lan] " Alexander Lobakin
2024-03-27 15:23 ` [Intel-wired-lan] [PATCH net-next v6 13/21] bitmap: make bitmap_{get, set}_value8() use bitmap_{read, write}() Alexander Lobakin
2024-03-27 15:23   ` [PATCH net-next v6 13/21] bitmap: make bitmap_{get,set}_value8() use bitmap_{read,write}() Alexander Lobakin
2024-03-27 15:23 ` [Intel-wired-lan] [PATCH net-next v6 14/21] lib/bitmap: add compile-time test for __assign_bit() optimization Alexander Lobakin
2024-03-27 15:23   ` Alexander Lobakin
2024-03-27 15:23 ` [PATCH net-next v6 15/21] ip_tunnel: use a separate struct to store tunnel params in the kernel Alexander Lobakin
2024-03-27 15:23   ` [Intel-wired-lan] " Alexander Lobakin
2024-04-04 14:24   ` Dan Carpenter
2024-04-04 14:24     ` [Intel-wired-lan] " Dan Carpenter
2024-04-04 15:47     ` Alexander Lobakin
2024-04-04 15:47       ` [Intel-wired-lan] " Alexander Lobakin
2024-03-27 15:23 ` [PATCH net-next v6 16/21] ip_tunnel: convert __be16 tunnel flags to bitmaps Alexander Lobakin
2024-03-27 15:23   ` [Intel-wired-lan] " Alexander Lobakin
2024-03-27 15:23 ` [PATCH net-next v6 17/21] net: net_test: add tests for IP tunnel flags conversion helpers Alexander Lobakin
2024-03-27 15:23   ` [Intel-wired-lan] " Alexander Lobakin
2024-03-27 15:23 ` [PATCH net-next v6 18/21] pfcp: add PFCP module Alexander Lobakin
2024-03-27 15:23   ` [Intel-wired-lan] " Alexander Lobakin
2024-03-27 15:23 ` [PATCH net-next v6 19/21] pfcp: always set pfcp metadata Alexander Lobakin
2024-03-27 15:23   ` [Intel-wired-lan] " Alexander Lobakin
2024-04-03 20:59   ` Arnd Bergmann
2024-04-03 20:59     ` [Intel-wired-lan] " Arnd Bergmann
2024-04-04  9:45     ` Michal Swiatkowski
2024-04-04  9:45       ` Michal Swiatkowski
2024-04-04  9:56       ` Arnd Bergmann
2024-04-04  9:56         ` [Intel-wired-lan] " Arnd Bergmann
2024-04-04 10:12         ` Michal Swiatkowski
2024-04-04 10:12           ` Michal Swiatkowski
2024-03-27 15:23 ` [PATCH net-next v6 20/21] ice: refactor ICE_TC_FLWR_FIELD_ENC_OPTS Alexander Lobakin
2024-03-27 15:23   ` [Intel-wired-lan] " Alexander Lobakin
2024-03-27 15:23 ` [PATCH net-next v6 21/21] ice: Add support for PFCP hardware offload in switchdev Alexander Lobakin
2024-03-27 15:23   ` [Intel-wired-lan] " Alexander Lobakin
2024-04-01 10:00 ` [PATCH net-next v6 00/21] ice: add PFCP filter support patchwork-bot+netdevbpf
2024-04-01 10:00   ` [Intel-wired-lan] " patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240327152358.2368467-7-aleksander.lobakin@intel.com \
    --to=aleksander.lobakin@intel.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=glider@google.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nex.sw.ncis.osdt.itp.upstreaming@intel.com \
    --cc=pabeni@redhat.com \
    --cc=przemyslaw.kitszel@intel.com \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.