From: Alexander Lobakin <aleksander.lobakin@intel.com> To: "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com> Cc: Alexander Lobakin <aleksander.lobakin@intel.com>, Yury Norov <yury.norov@gmail.com>, Alexander Potapenko <glider@google.com>, nex.sw.ncis.osdt.itp.upstreaming@intel.com, intel-wired-lan@lists.osuosl.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Andy Shevchenko <andriy.shevchenko@linux.intel.com>, Przemek Kitszel <przemyslaw.kitszel@intel.com> Subject: [PATCH net-next v6 06/21] bitops: let the compiler optimize {__,}assign_bit() Date: Wed, 27 Mar 2024 16:23:43 +0100 [thread overview] Message-ID: <20240327152358.2368467-7-aleksander.lobakin@intel.com> (raw) In-Reply-To: <20240327152358.2368467-1-aleksander.lobakin@intel.com> Since commit b03fc1173c0c ("bitops: let optimize out non-atomic bitops on compile-time constants"), the compilers are able to expand inline bitmap operations to compile-time initializers when possible. However, during the round of replacement if-__set-else-__clear with __assign_bit() as per Andy's advice, bloat-o-meter showed +1024 bytes difference in object code size for one module (even one function), where the pattern: DECLARE_BITMAP(foo) = { }; // on the stack, zeroed if (a) __set_bit(const_bit_num, foo); if (b) __set_bit(another_const_bit_num, foo); ... is heavily used, although there should be no difference: the bitmap is zeroed, so the second half of __assign_bit() should be compiled-out as a no-op. I either missed the fact that __assign_bit() has bitmap pointer marked as `volatile` (as we usually do for bitops) or was hoping that the compilers would at least try to look past the `volatile` for __always_inline functions. Anyhow, due to that attribute, the compilers were always compiling the whole expression and no mentioned compile-time optimizations were working. Convert __assign_bit() to a macro since it's a very simple if-else and all of the checks are performed inside __set_bit() and __clear_bit(), thus that wrapper has to be as transparent as possible. After that change, despite it showing only -20 bytes change for vmlinux (due to that it's still relatively unpopular), no drastic code size changes happen when replacing if-set-else-clear for onstack bitmaps with __assign_bit(), meaning the compiler now expands them to the actual operations will all the expected optimizations. Atomic assign_bit() is less affected due to its nature, but let's convert it to a macro as well to keep the code consistent and not leave a place for possible suboptimal codegen. Moreover, with certain kernel configuration it actually gives some saves (x86): do_ip_setsockopt 4154 4099 -55 Suggested-by: Yury Norov <yury.norov@gmail.com> # assign_bit(), too Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Acked-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> --- include/linux/bitops.h | 20 ++++---------------- 1 file changed, 4 insertions(+), 16 deletions(-) diff --git a/include/linux/bitops.h b/include/linux/bitops.h index e0cd09eb91cd..b25dc8742124 100644 --- a/include/linux/bitops.h +++ b/include/linux/bitops.h @@ -275,23 +275,11 @@ static inline unsigned long fns(unsigned long word, unsigned int n) * @addr: the address to start counting from * @value: the value to assign */ -static __always_inline void assign_bit(long nr, volatile unsigned long *addr, - bool value) -{ - if (value) - set_bit(nr, addr); - else - clear_bit(nr, addr); -} +#define assign_bit(nr, addr, value) \ + ((value) ? set_bit((nr), (addr)) : clear_bit((nr), (addr))) -static __always_inline void __assign_bit(long nr, volatile unsigned long *addr, - bool value) -{ - if (value) - __set_bit(nr, addr); - else - __clear_bit(nr, addr); -} +#define __assign_bit(nr, addr, value) \ + ((value) ? __set_bit((nr), (addr)) : __clear_bit((nr), (addr))) /** * __ptr_set_bit - Set bit in a pointer's value -- 2.44.0
WARNING: multiple messages have this Message-ID (diff)
From: Alexander Lobakin <aleksander.lobakin@intel.com> To: "David S. Miller" <davem@davemloft.net>, Eric Dumazet <edumazet@google.com>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com> Cc: Yury Norov <yury.norov@gmail.com>, netdev@vger.kernel.org, linux-kernel@vger.kernel.org, Alexander Lobakin <aleksander.lobakin@intel.com>, intel-wired-lan@lists.osuosl.org, nex.sw.ncis.osdt.itp.upstreaming@intel.com, Alexander Potapenko <glider@google.com>, Andy Shevchenko <andriy.shevchenko@linux.intel.com>, Przemek Kitszel <przemyslaw.kitszel@intel.com> Subject: [Intel-wired-lan] [PATCH net-next v6 06/21] bitops: let the compiler optimize {__, }assign_bit() Date: Wed, 27 Mar 2024 16:23:43 +0100 [thread overview] Message-ID: <20240327152358.2368467-7-aleksander.lobakin@intel.com> (raw) In-Reply-To: <20240327152358.2368467-1-aleksander.lobakin@intel.com> Since commit b03fc1173c0c ("bitops: let optimize out non-atomic bitops on compile-time constants"), the compilers are able to expand inline bitmap operations to compile-time initializers when possible. However, during the round of replacement if-__set-else-__clear with __assign_bit() as per Andy's advice, bloat-o-meter showed +1024 bytes difference in object code size for one module (even one function), where the pattern: DECLARE_BITMAP(foo) = { }; // on the stack, zeroed if (a) __set_bit(const_bit_num, foo); if (b) __set_bit(another_const_bit_num, foo); ... is heavily used, although there should be no difference: the bitmap is zeroed, so the second half of __assign_bit() should be compiled-out as a no-op. I either missed the fact that __assign_bit() has bitmap pointer marked as `volatile` (as we usually do for bitops) or was hoping that the compilers would at least try to look past the `volatile` for __always_inline functions. Anyhow, due to that attribute, the compilers were always compiling the whole expression and no mentioned compile-time optimizations were working. Convert __assign_bit() to a macro since it's a very simple if-else and all of the checks are performed inside __set_bit() and __clear_bit(), thus that wrapper has to be as transparent as possible. After that change, despite it showing only -20 bytes change for vmlinux (due to that it's still relatively unpopular), no drastic code size changes happen when replacing if-set-else-clear for onstack bitmaps with __assign_bit(), meaning the compiler now expands them to the actual operations will all the expected optimizations. Atomic assign_bit() is less affected due to its nature, but let's convert it to a macro as well to keep the code consistent and not leave a place for possible suboptimal codegen. Moreover, with certain kernel configuration it actually gives some saves (x86): do_ip_setsockopt 4154 4099 -55 Suggested-by: Yury Norov <yury.norov@gmail.com> # assign_bit(), too Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Acked-by: Yury Norov <yury.norov@gmail.com> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> --- include/linux/bitops.h | 20 ++++---------------- 1 file changed, 4 insertions(+), 16 deletions(-) diff --git a/include/linux/bitops.h b/include/linux/bitops.h index e0cd09eb91cd..b25dc8742124 100644 --- a/include/linux/bitops.h +++ b/include/linux/bitops.h @@ -275,23 +275,11 @@ static inline unsigned long fns(unsigned long word, unsigned int n) * @addr: the address to start counting from * @value: the value to assign */ -static __always_inline void assign_bit(long nr, volatile unsigned long *addr, - bool value) -{ - if (value) - set_bit(nr, addr); - else - clear_bit(nr, addr); -} +#define assign_bit(nr, addr, value) \ + ((value) ? set_bit((nr), (addr)) : clear_bit((nr), (addr))) -static __always_inline void __assign_bit(long nr, volatile unsigned long *addr, - bool value) -{ - if (value) - __set_bit(nr, addr); - else - __clear_bit(nr, addr); -} +#define __assign_bit(nr, addr, value) \ + ((value) ? __set_bit((nr), (addr)) : __clear_bit((nr), (addr))) /** * __ptr_set_bit - Set bit in a pointer's value -- 2.44.0
next prev parent reply other threads:[~2024-03-27 15:24 UTC|newest] Thread overview: 62+ messages / expand[flat|nested] mbox.gz Atom feed top 2024-03-27 15:23 [PATCH net-next v6 00/21] ice: add PFCP filter support Alexander Lobakin 2024-03-27 15:23 ` [Intel-wired-lan] " Alexander Lobakin 2024-03-27 15:23 ` [PATCH net-next v6 01/21] lib/bitmap: add bitmap_{read,write}() Alexander Lobakin 2024-03-27 15:23 ` [Intel-wired-lan] [PATCH net-next v6 01/21] lib/bitmap: add bitmap_{read, write}() Alexander Lobakin 2024-03-27 15:23 ` [PATCH net-next v6 02/21] lib/test_bitmap: add tests for bitmap_{read,write}() Alexander Lobakin 2024-03-27 15:23 ` [Intel-wired-lan] [PATCH net-next v6 02/21] lib/test_bitmap: add tests for bitmap_{read, write}() Alexander Lobakin 2024-03-27 15:47 ` [PATCH net-next v6 02/21] lib/test_bitmap: add tests for bitmap_{read,write}() Andy Shevchenko 2024-03-27 15:47 ` [Intel-wired-lan] [PATCH net-next v6 02/21] lib/test_bitmap: add tests for bitmap_{read, write}() Andy Shevchenko 2024-03-27 16:49 ` [PATCH net-next v6 02/21] lib/test_bitmap: add tests for bitmap_{read,write}() Alexander Lobakin 2024-03-27 16:49 ` [Intel-wired-lan] [PATCH net-next v6 02/21] lib/test_bitmap: add tests for bitmap_{read, write}() Alexander Lobakin 2024-03-27 15:23 ` [PATCH net-next v6 03/21] lib/test_bitmap: use pr_info() for non-error messages Alexander Lobakin 2024-03-27 15:23 ` [Intel-wired-lan] " Alexander Lobakin 2024-03-27 15:23 ` [PATCH net-next v6 04/21] bitops: add missing prototype check Alexander Lobakin 2024-03-27 15:23 ` [Intel-wired-lan] " Alexander Lobakin 2024-03-27 15:23 ` [PATCH net-next v6 05/21] bitops: make BYTES_TO_BITS() treewide-available Alexander Lobakin 2024-03-27 15:23 ` [Intel-wired-lan] " Alexander Lobakin 2024-03-27 15:23 ` Alexander Lobakin [this message] 2024-03-27 15:23 ` [Intel-wired-lan] [PATCH net-next v6 06/21] bitops: let the compiler optimize {__, }assign_bit() Alexander Lobakin 2024-03-27 15:23 ` [PATCH net-next v6 07/21] linkmode: convert linkmode_{test,set,clear,mod}_bit() to macros Alexander Lobakin 2024-03-27 15:23 ` [Intel-wired-lan] [PATCH net-next v6 07/21] linkmode: convert linkmode_{test, set, clear, mod}_bit() " Alexander Lobakin 2024-03-27 15:23 ` [PATCH net-next v6 08/21] s390/cio: rename bitmap_size() -> idset_bitmap_size() Alexander Lobakin 2024-03-27 15:23 ` [Intel-wired-lan] " Alexander Lobakin 2024-03-27 15:23 ` [PATCH net-next v6 09/21] fs/ntfs3: add prefix to bitmap_size() and use BITS_TO_U64() Alexander Lobakin 2024-03-27 15:23 ` [Intel-wired-lan] " Alexander Lobakin 2024-03-27 15:23 ` [PATCH net-next v6 10/21] btrfs: rename bitmap_set_bits() -> btrfs_bitmap_set_bits() Alexander Lobakin 2024-03-27 15:23 ` [Intel-wired-lan] " Alexander Lobakin 2024-03-27 15:23 ` [PATCH net-next v6 11/21] tools: move alignment-related macros to new <linux/align.h> Alexander Lobakin 2024-03-27 15:23 ` [Intel-wired-lan] " Alexander Lobakin 2024-03-27 15:23 ` [PATCH net-next v6 12/21] bitmap: introduce generic optimized bitmap_size() Alexander Lobakin 2024-03-27 15:23 ` [Intel-wired-lan] " Alexander Lobakin 2024-03-27 15:23 ` [Intel-wired-lan] [PATCH net-next v6 13/21] bitmap: make bitmap_{get, set}_value8() use bitmap_{read, write}() Alexander Lobakin 2024-03-27 15:23 ` [PATCH net-next v6 13/21] bitmap: make bitmap_{get,set}_value8() use bitmap_{read,write}() Alexander Lobakin 2024-03-27 15:23 ` [Intel-wired-lan] [PATCH net-next v6 14/21] lib/bitmap: add compile-time test for __assign_bit() optimization Alexander Lobakin 2024-03-27 15:23 ` Alexander Lobakin 2024-03-27 15:23 ` [PATCH net-next v6 15/21] ip_tunnel: use a separate struct to store tunnel params in the kernel Alexander Lobakin 2024-03-27 15:23 ` [Intel-wired-lan] " Alexander Lobakin 2024-04-04 14:24 ` Dan Carpenter 2024-04-04 14:24 ` [Intel-wired-lan] " Dan Carpenter 2024-04-04 15:47 ` Alexander Lobakin 2024-04-04 15:47 ` [Intel-wired-lan] " Alexander Lobakin 2024-03-27 15:23 ` [PATCH net-next v6 16/21] ip_tunnel: convert __be16 tunnel flags to bitmaps Alexander Lobakin 2024-03-27 15:23 ` [Intel-wired-lan] " Alexander Lobakin 2024-03-27 15:23 ` [PATCH net-next v6 17/21] net: net_test: add tests for IP tunnel flags conversion helpers Alexander Lobakin 2024-03-27 15:23 ` [Intel-wired-lan] " Alexander Lobakin 2024-03-27 15:23 ` [PATCH net-next v6 18/21] pfcp: add PFCP module Alexander Lobakin 2024-03-27 15:23 ` [Intel-wired-lan] " Alexander Lobakin 2024-03-27 15:23 ` [PATCH net-next v6 19/21] pfcp: always set pfcp metadata Alexander Lobakin 2024-03-27 15:23 ` [Intel-wired-lan] " Alexander Lobakin 2024-04-03 20:59 ` Arnd Bergmann 2024-04-03 20:59 ` [Intel-wired-lan] " Arnd Bergmann 2024-04-04 9:45 ` Michal Swiatkowski 2024-04-04 9:45 ` Michal Swiatkowski 2024-04-04 9:56 ` Arnd Bergmann 2024-04-04 9:56 ` [Intel-wired-lan] " Arnd Bergmann 2024-04-04 10:12 ` Michal Swiatkowski 2024-04-04 10:12 ` Michal Swiatkowski 2024-03-27 15:23 ` [PATCH net-next v6 20/21] ice: refactor ICE_TC_FLWR_FIELD_ENC_OPTS Alexander Lobakin 2024-03-27 15:23 ` [Intel-wired-lan] " Alexander Lobakin 2024-03-27 15:23 ` [PATCH net-next v6 21/21] ice: Add support for PFCP hardware offload in switchdev Alexander Lobakin 2024-03-27 15:23 ` [Intel-wired-lan] " Alexander Lobakin 2024-04-01 10:00 ` [PATCH net-next v6 00/21] ice: add PFCP filter support patchwork-bot+netdevbpf 2024-04-01 10:00 ` [Intel-wired-lan] " patchwork-bot+netdevbpf
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20240327152358.2368467-7-aleksander.lobakin@intel.com \ --to=aleksander.lobakin@intel.com \ --cc=andriy.shevchenko@linux.intel.com \ --cc=davem@davemloft.net \ --cc=edumazet@google.com \ --cc=glider@google.com \ --cc=intel-wired-lan@lists.osuosl.org \ --cc=kuba@kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=netdev@vger.kernel.org \ --cc=nex.sw.ncis.osdt.itp.upstreaming@intel.com \ --cc=pabeni@redhat.com \ --cc=przemyslaw.kitszel@intel.com \ --cc=yury.norov@gmail.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.