All of lore.kernel.org
 help / color / mirror / Atom feed
From: William Breathitt Gray <vilhelm.gray@gmail.com>
To: Arnd Bergmann <arnd@kernel.org>
Cc: Syed Nayyar Waris <syednwaris@gmail.com>,
	Linus Walleij <linus.walleij@linaro.org>,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
	Michal Simek <michal.simek@xilinx.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Robert Richter <rrichter@marvell.com>,
	Bartosz Golaszewski <bgolaszewski@baylibre.com>,
	Masahiro Yamada <yamada.masahiro@socionext.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Zhang Rui <rui.zhang@intel.com>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	Amit Kucheria <amit.kucheria@verdurent.com>,
	linux-arch <linux-arch@vger.kernel.org>,
	"open list:GPIO SUBSYSTEM" <linux-gpio@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>,
	Linux PM list <linux-pm@vger.kernel.org>
Subject: Re: [PATCH 1/5] clump_bits: Introduce the for_each_set_clump macro
Date: Mon, 28 Dec 2020 07:10:55 -0500	[thread overview]
Message-ID: <X+nLT8bMsKJb7nug@shinobu> (raw)
In-Reply-To: <CAK8P3a35N1TvRQsGt+G52XSx0N4FQe_76pU4sf4EiH3Gq=s66A@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3375 bytes --]

On Sun, Dec 27, 2020 at 11:03:06PM +0100, Arnd Bergmann wrote:
> On Sat, Dec 26, 2020 at 7:42 AM Syed Nayyar Waris <syednwaris@gmail.com> wrote:
> >
> > This macro iterates for each group of bits (clump) with set bits,
> > within a bitmap memory region. For each iteration, "start" is set to
> > the bit offset of the found clump, while the respective clump value is
> > stored to the location pointed by "clump". Additionally, the
> > bitmap_get_value() and bitmap_set_value() functions are introduced to
> > respectively get and set a value of n-bits in a bitmap memory region.
> > The n-bits can have any size from 1 to BITS_PER_LONG. size less
> > than 1 or more than BITS_PER_LONG causes undefined behaviour.
> > Moreover, during setting value of n-bit in bitmap, if a situation arise
> > that the width of next n-bit is exceeding the word boundary, then it
> > will divide itself such that some portion of it is stored in that word,
> > while the remaining portion is stored in the next higher word. Similar
> > situation occurs while retrieving the value from bitmap.
> >
> > GCC gives warning in bitmap_set_value(): https://godbolt.org/z/rjx34r
> > Add explicit check to see if the value being written into the bitmap
> > does not fall outside the bitmap.
> > The situation that it is falling outside would never be possible in the
> > code because the boundaries are required to be correct before the
> > function is called. The responsibility is on the caller for ensuring the
> > boundaries are correct.
> > The code change is simply to silence the GCC warning messages
> > because GCC is not aware that the boundaries have already been checked.
> > As such, we're better off using __builtin_unreachable() here because we
> > can avoid the latency of the conditional check entirely.
> 
> Didn't the __builtin_unreachable() end up leading to an objtool
> warning about incorrect stack frames for the code path that leads
> into the undefined behavior? I thought I saw a message from the 0day
> build bot about that and didn't expect to see it again after that.
> 
> Can you actually measure any performance difference compared
> to BUG_ON() that avoids the undefined behavior? Practically
> all CPUs from the past 20 years have branch predictors that should
> completely hide measurable overhead from this.
> 
>       Arnd

When I initially recommended using __builtin_unreachable(), I was
anticipating the use of bitmap_set_value() in kernel at large -- so the
possible performance hit from a conditional check was a concern for me.
However, now that we're restricting the scope of bitmap_set_value() to
only the GPIO subsystem, such optimization is no longer a major concern
I feel: gpio-xilinx is the only driver utilizing bitmap_set_value() --
and we know it won't be called in a loop -- so whatever hypothetical
performance hit there might be is inconsequential in the end.

Instead, we should focus on code clarity now. I believe it makes sense
given the new scope of this function to revert back to the earlier
suggestion of passing in and checking the boundary explicitly, and to
remove the __builtin_unreachable() call for now. If bitmap_set_value()
becomes available to the rest of the kernel in the future, we can
reconsider whether or not to use __builtin_unreachable().

William Breathitt Gray

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

WARNING: multiple messages have this Message-ID (diff)
From: William Breathitt Gray <vilhelm.gray@gmail.com>
To: Arnd Bergmann <arnd@kernel.org>
Cc: linux-arch <linux-arch@vger.kernel.org>,
	Amit Kucheria <amit.kucheria@verdurent.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Masahiro Yamada <yamada.masahiro@socionext.com>,
	Linus Walleij <linus.walleij@linaro.org>,
	Daniel Lezcano <daniel.lezcano@linaro.org>,
	Michal Simek <michal.simek@xilinx.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Bartosz Golaszewski <bgolaszewski@baylibre.com>,
	Robert Richter <rrichter@marvell.com>,
	"open list:GPIO SUBSYSTEM" <linux-gpio@vger.kernel.org>,
	Linux PM list <linux-pm@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andy Shevchenko <andriy.shevchenko@linux.intel.com>,
	Syed Nayyar Waris <syednwaris@gmail.com>,
	Zhang Rui <rui.zhang@intel.com>,
	Linux ARM <linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH 1/5] clump_bits: Introduce the for_each_set_clump macro
Date: Mon, 28 Dec 2020 07:10:55 -0500	[thread overview]
Message-ID: <X+nLT8bMsKJb7nug@shinobu> (raw)
In-Reply-To: <CAK8P3a35N1TvRQsGt+G52XSx0N4FQe_76pU4sf4EiH3Gq=s66A@mail.gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 3375 bytes --]

On Sun, Dec 27, 2020 at 11:03:06PM +0100, Arnd Bergmann wrote:
> On Sat, Dec 26, 2020 at 7:42 AM Syed Nayyar Waris <syednwaris@gmail.com> wrote:
> >
> > This macro iterates for each group of bits (clump) with set bits,
> > within a bitmap memory region. For each iteration, "start" is set to
> > the bit offset of the found clump, while the respective clump value is
> > stored to the location pointed by "clump". Additionally, the
> > bitmap_get_value() and bitmap_set_value() functions are introduced to
> > respectively get and set a value of n-bits in a bitmap memory region.
> > The n-bits can have any size from 1 to BITS_PER_LONG. size less
> > than 1 or more than BITS_PER_LONG causes undefined behaviour.
> > Moreover, during setting value of n-bit in bitmap, if a situation arise
> > that the width of next n-bit is exceeding the word boundary, then it
> > will divide itself such that some portion of it is stored in that word,
> > while the remaining portion is stored in the next higher word. Similar
> > situation occurs while retrieving the value from bitmap.
> >
> > GCC gives warning in bitmap_set_value(): https://godbolt.org/z/rjx34r
> > Add explicit check to see if the value being written into the bitmap
> > does not fall outside the bitmap.
> > The situation that it is falling outside would never be possible in the
> > code because the boundaries are required to be correct before the
> > function is called. The responsibility is on the caller for ensuring the
> > boundaries are correct.
> > The code change is simply to silence the GCC warning messages
> > because GCC is not aware that the boundaries have already been checked.
> > As such, we're better off using __builtin_unreachable() here because we
> > can avoid the latency of the conditional check entirely.
> 
> Didn't the __builtin_unreachable() end up leading to an objtool
> warning about incorrect stack frames for the code path that leads
> into the undefined behavior? I thought I saw a message from the 0day
> build bot about that and didn't expect to see it again after that.
> 
> Can you actually measure any performance difference compared
> to BUG_ON() that avoids the undefined behavior? Practically
> all CPUs from the past 20 years have branch predictors that should
> completely hide measurable overhead from this.
> 
>       Arnd

When I initially recommended using __builtin_unreachable(), I was
anticipating the use of bitmap_set_value() in kernel at large -- so the
possible performance hit from a conditional check was a concern for me.
However, now that we're restricting the scope of bitmap_set_value() to
only the GPIO subsystem, such optimization is no longer a major concern
I feel: gpio-xilinx is the only driver utilizing bitmap_set_value() --
and we know it won't be called in a loop -- so whatever hypothetical
performance hit there might be is inconsequential in the end.

Instead, we should focus on code clarity now. I believe it makes sense
given the new scope of this function to revert back to the earlier
suggestion of passing in and checking the boundary explicitly, and to
remove the __builtin_unreachable() call for now. If bitmap_set_value()
becomes available to the rest of the kernel in the future, we can
reconsider whether or not to use __builtin_unreachable().

William Breathitt Gray

[-- Attachment #1.2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

[-- Attachment #2: Type: text/plain, Size: 176 bytes --]

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2020-12-28 12:12 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-26  6:41 [PATCH 0/5] Introduce the for_each_set_clump macro Syed Nayyar Waris
2020-12-26  6:41 ` Syed Nayyar Waris
2020-12-26  6:42 ` [PATCH 1/5] clump_bits: " Syed Nayyar Waris
2020-12-26  6:42   ` Syed Nayyar Waris
2020-12-27 22:03   ` Arnd Bergmann
2020-12-27 22:03     ` Arnd Bergmann
2020-12-28 12:10     ` William Breathitt Gray [this message]
2020-12-28 12:10       ` William Breathitt Gray
2020-12-26  6:43 ` [PATCH 2/5] lib/test_bitmap.c: Add for_each_set_clump test cases Syed Nayyar Waris
2020-12-26  6:43   ` Syed Nayyar Waris
2020-12-26 14:43   ` kernel test robot
2020-12-26 14:48   ` kernel test robot
2020-12-26 15:03   ` kernel test robot
     [not found]   ` <CAHp75VcSsfDKY3w4ufZktXzRB=GiObAV6voPfmeAHcbdwX0uqg@mail.gmail.com>
2021-02-04  8:55     ` Syed Nayyar Waris
2021-02-04  8:55       ` Syed Nayyar Waris
2021-02-07  4:18       ` Syed Nayyar Waris
2021-02-07  4:18         ` Syed Nayyar Waris
2020-12-26  6:43 ` [PATCH 3/5] gpio: thunderx: Utilize for_each_set_clump macro Syed Nayyar Waris
2020-12-26  6:43   ` Syed Nayyar Waris
2020-12-26 16:26   ` kernel test robot
2020-12-26 20:18   ` kernel test robot
2020-12-26  6:44 ` [PATCH 4/5] gpio: xilinx: Utilize generic bitmap_get_value and _set_value Syed Nayyar Waris
2020-12-26  6:44   ` Syed Nayyar Waris
2020-12-27 21:29   ` Linus Walleij
2020-12-27 21:29     ` Linus Walleij
2021-01-05 11:04     ` Michal Simek
2021-01-05 11:04       ` Michal Simek
2020-12-29  0:50   ` kernel test robot
2020-12-29  0:50   ` kernel test robot
2020-12-26  6:45 ` [PATCH 5/5] gpio: xilinx: Add extra check if sum of widths exceed 64 Syed Nayyar Waris
2020-12-26  6:45   ` Syed Nayyar Waris
2020-12-26  8:31   ` kernel test robot
2020-12-28 11:58   ` William Breathitt Gray
2020-12-28 11:58     ` William Breathitt Gray
2021-01-05 11:01   ` Michal Simek
2021-01-05 11:01     ` Michal Simek
2020-12-27 21:26 ` [PATCH 0/5] Introduce the for_each_set_clump macro Linus Walleij
2020-12-27 21:26   ` Linus Walleij
2021-01-05 14:19   ` Bartosz Golaszewski
2021-01-05 14:19     ` Bartosz Golaszewski
2021-01-05 14:39     ` Andy Shevchenko
2021-01-05 14:39       ` Andy Shevchenko
2021-01-06  7:27       ` Bartosz Golaszewski
2021-01-06  7:27         ` Bartosz Golaszewski
2021-01-06  8:19         ` William Breathitt Gray
2021-01-06  8:19           ` William Breathitt Gray
2021-01-05 11:09 ` Michal Simek
2021-01-05 11:09   ` Michal Simek

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=X+nLT8bMsKJb7nug@shinobu \
    --to=vilhelm.gray@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=amit.kucheria@verdurent.com \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=arnd@arndb.de \
    --cc=arnd@kernel.org \
    --cc=bgolaszewski@baylibre.com \
    --cc=daniel.lezcano@linaro.org \
    --cc=linus.walleij@linaro.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-gpio@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pm@vger.kernel.org \
    --cc=michal.simek@xilinx.com \
    --cc=rrichter@marvell.com \
    --cc=rui.zhang@intel.com \
    --cc=syednwaris@gmail.com \
    --cc=yamada.masahiro@socionext.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.