* Re: [PATCH v2] minmax: clamp more efficiently by avoiding extra comparison [not found] ` <20220923155412.b0132fc62eca18817a023cd2@linux-foundation.org> @ 2022-09-24 10:37 ` Jason A. Donenfeld 2022-09-25 16:29 ` Andrew Morton 2022-09-26 10:00 ` Andy Shevchenko 0 siblings, 2 replies; 5+ messages in thread From: Jason A. Donenfeld @ 2022-09-24 10:37 UTC (permalink / raw) To: Andrew Morton; +Cc: Andy Shevchenko, linux-kernel, Kees Cook, linux-toolchains Hi Andrew, On Fri, Sep 23, 2022 at 03:54:12PM -0700, Andrew Morton wrote: > On Fri, 23 Sep 2022 17:40:01 +0200 "Jason A. Donenfeld" <Jason@zx2c4.com> wrote: > > > Currently the clamp algorithm does: > > > > if (val > hi) > > val = hi; > > if (val < lo) > > val = lo; > > > > But since hi > lo by definition, this can be made more efficient with: > > > > if (val > hi) > > val = hi; > > else if (val < lo) > > val = lo; > > > > So fix up the clamp and clamp_t functions to do this, adding the same > > argument checking as for min and min_t. > > > > The patch adds 140 bytes of text to mm/memblock.o, for example. > Presumably from the additional branch. Larger text means larger cache > footprint means slower. > > So where's the proof that this change gives us a more efficient kernel? For x86, code generation ought to be the same, because the compiler can still generate cmovs for all: unsigned int clamp1(unsigned int val, unsigned int lo, unsigned int hi) { if (val >= hi) val = hi; if (val <= lo) val = lo; return val; } unsigned int clamp2(unsigned int val, unsigned int lo, unsigned int hi) { if (val >= hi) val = hi; else if (val <= lo) val = lo; return val; } On x86_64 this is: clamp1: cmp edi, edx mov eax, esi cmova edi, edx cmp edi, esi cmovnb eax, edi ret clamp2: cmp edi, esi mov eax, edx cmovnb esi, edi cmp edi, edx cmovb eax, esi ret The latter one clever compares hi and lo first. I observe the same when hi and lo are constants instead. So no change. On ARM64 the same thing happens: clamp1: cmp w0, w2 csel w8, w0, w2, lo cmp w8, w1 csel w0, w8, w1, hi ret clamp2: cmp w0, w1 csel w8, w0, w1, hi cmp w0, w2 csel w0, w8, w2, lo ret On MIPS64, on the other hand, we save some arithmetic and the number of branches remains the same: clamp1: sltu $3,$6,$4 bne $3,$0,.L2 move $2,$6 move $2,$4 .L2: sltu $3,$2,$5 bnel $3,$0,.L7 move $2,$5 .L7: jr $31 nop clamp2: sltu $3,$4,$6 beq $3,$0,.L13 move $2,$6 sltu $3,$4,$5 bne $3,$0,.L12 move $2,$4 .L13: jr $31 nop .L12: jr $31 move $2,$5 So it seems like, at least in isolation, this is only a win? It's possible that when inlined into a more complex function that the various cases are optimized together and a branch is introduced if the compiler thinks its better; but alone I'm not seeing that happen. Or maybe older compilers do something worse? On x86_64, clang doesn't do the smart thing until clang 13 and gcc not until gcc 11. So maybe your text size blew up because your compiler is old? Either way, I agree that text size increase is not a good idea, and we should avoid that if we can. Worth noting, by the way, is that the input validation check already caught a bug when 0day test bot choked: https://lore.kernel.org/linux-hwmon/20220924101151.4168414-1-Jason@zx2c4.com/ So, options: 1) Keep this patch as-is, because it is useful on modern compilers. 2) Add an ifdef on compiler version, so we generate the best code in each case. 2) Go back to testing twice, but keep the checker macro because it's apparently useful. 4) Do nothing and discard this series. Any of those are okay with me. Opinions? Jason ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] minmax: clamp more efficiently by avoiding extra comparison 2022-09-24 10:37 ` [PATCH v2] minmax: clamp more efficiently by avoiding extra comparison Jason A. Donenfeld @ 2022-09-25 16:29 ` Andrew Morton 2022-09-26 10:00 ` Andy Shevchenko 1 sibling, 0 replies; 5+ messages in thread From: Andrew Morton @ 2022-09-25 16:29 UTC (permalink / raw) To: Jason A. Donenfeld Cc: Andy Shevchenko, linux-kernel, Kees Cook, linux-toolchains On Sat, 24 Sep 2022 12:37:26 +0200 "Jason A. Donenfeld" <Jason@zx2c4.com> wrote: > Or maybe older compilers do something worse? On x86_64, clang doesn't do > the smart thing until clang 13 and gcc not until gcc 11. So maybe your > text size blew up because your compiler is old? I used gcc-11.1.0. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] minmax: clamp more efficiently by avoiding extra comparison 2022-09-24 10:37 ` [PATCH v2] minmax: clamp more efficiently by avoiding extra comparison Jason A. Donenfeld 2022-09-25 16:29 ` Andrew Morton @ 2022-09-26 10:00 ` Andy Shevchenko 2022-09-26 12:23 ` Jason A. Donenfeld 1 sibling, 1 reply; 5+ messages in thread From: Andy Shevchenko @ 2022-09-26 10:00 UTC (permalink / raw) To: Jason A. Donenfeld Cc: Andrew Morton, linux-kernel, Kees Cook, linux-toolchains On Sat, Sep 24, 2022 at 12:37:26PM +0200, Jason A. Donenfeld wrote: > On Fri, Sep 23, 2022 at 03:54:12PM -0700, Andrew Morton wrote: > > On Fri, 23 Sep 2022 17:40:01 +0200 "Jason A. Donenfeld" <Jason@zx2c4.com> wrote: ... > Worth noting, by the way, is that the input validation check already > caught a bug when 0day test bot choked: > > https://lore.kernel.org/linux-hwmon/20220924101151.4168414-1-Jason@zx2c4.com/ Hooray, it was a good idea! :-) > So, options: > 1) Keep this patch as-is, because it is useful on modern compilers. > 2) Add an ifdef on compiler version, so we generate the best code in > each case. > 3) Go back to testing twice, but keep the checker macro because it's > apparently useful. > 4) Do nothing and discard this series. > > Any of those are okay with me. Opinions? I tend to case 3) (I believe you typo'ed double 2) cases) and apply the rest as a separate change with all downsides explained (kinda 1) approach). -- With Best Regards, Andy Shevchenko ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] minmax: clamp more efficiently by avoiding extra comparison 2022-09-26 10:00 ` Andy Shevchenko @ 2022-09-26 12:23 ` Jason A. Donenfeld 2022-09-26 18:30 ` Kees Cook 0 siblings, 1 reply; 5+ messages in thread From: Jason A. Donenfeld @ 2022-09-26 12:23 UTC (permalink / raw) To: Andy Shevchenko; +Cc: Andrew Morton, linux-kernel, Kees Cook, linux-toolchains On Mon, Sep 26, 2022 at 12:00 PM Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote: > > On Sat, Sep 24, 2022 at 12:37:26PM +0200, Jason A. Donenfeld wrote: > > On Fri, Sep 23, 2022 at 03:54:12PM -0700, Andrew Morton wrote: > > > On Fri, 23 Sep 2022 17:40:01 +0200 "Jason A. Donenfeld" <Jason@zx2c4.com> wrote: > > ... > > > Worth noting, by the way, is that the input validation check already > > caught a bug when 0day test bot choked: > > > > https://lore.kernel.org/linux-hwmon/20220924101151.4168414-1-Jason@zx2c4.com/ > > Hooray, it was a good idea! :-) > > > So, options: > > 1) Keep this patch as-is, because it is useful on modern compilers. > > 2) Add an ifdef on compiler version, so we generate the best code in > > each case. > > 3) Go back to testing twice, but keep the checker macro because it's > > apparently useful. > > 4) Do nothing and discard this series. > > > > Any of those are okay with me. Opinions? > > I tend to case 3) (I believe you typo'ed double 2) cases) and apply the rest > as a separate change with all downsides explained (kinda 1) approach). Alright, I'll do that. v3 on its way, then. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v2] minmax: clamp more efficiently by avoiding extra comparison 2022-09-26 12:23 ` Jason A. Donenfeld @ 2022-09-26 18:30 ` Kees Cook 0 siblings, 0 replies; 5+ messages in thread From: Kees Cook @ 2022-09-26 18:30 UTC (permalink / raw) To: Jason A. Donenfeld Cc: Andy Shevchenko, Andrew Morton, linux-kernel, linux-toolchains On Mon, Sep 26, 2022 at 02:23:48PM +0200, Jason A. Donenfeld wrote: > On Mon, Sep 26, 2022 at 12:00 PM Andy Shevchenko > <andriy.shevchenko@linux.intel.com> wrote: > > > > On Sat, Sep 24, 2022 at 12:37:26PM +0200, Jason A. Donenfeld wrote: > > > On Fri, Sep 23, 2022 at 03:54:12PM -0700, Andrew Morton wrote: > > > > On Fri, 23 Sep 2022 17:40:01 +0200 "Jason A. Donenfeld" <Jason@zx2c4.com> wrote: > > > > ... > > > > > Worth noting, by the way, is that the input validation check already > > > caught a bug when 0day test bot choked: > > > > > > https://lore.kernel.org/linux-hwmon/20220924101151.4168414-1-Jason@zx2c4.com/ > > > > Hooray, it was a good idea! :-) > > > > > So, options: > > > 1) Keep this patch as-is, because it is useful on modern compilers. > > > 2) Add an ifdef on compiler version, so we generate the best code in > > > each case. > > > 3) Go back to testing twice, but keep the checker macro because it's > > > apparently useful. > > > 4) Do nothing and discard this series. > > > > > > Any of those are okay with me. Opinions? > > > > I tend to case 3) (I believe you typo'ed double 2) cases) and apply the rest > > as a separate change with all downsides explained (kinda 1) approach). > > Alright, I'll do that. v3 on its way, then. Cool. I've dropped v2 from my -next tree. -- Kees Cook ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2022-09-26 18:31 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CAHmME9rH47UFp6sXbDU0UZrTosFrDAa+m_FtqMqRFFNzmOzTdA@mail.gmail.com> [not found] ` <20220923154001.4074849-1-Jason@zx2c4.com> [not found] ` <20220923155412.b0132fc62eca18817a023cd2@linux-foundation.org> 2022-09-24 10:37 ` [PATCH v2] minmax: clamp more efficiently by avoiding extra comparison Jason A. Donenfeld 2022-09-25 16:29 ` Andrew Morton 2022-09-26 10:00 ` Andy Shevchenko 2022-09-26 12:23 ` Jason A. Donenfeld 2022-09-26 18:30 ` Kees Cook
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).