linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] x86: prefer tzcnt over bsf
@ 2012-09-10 11:24 Jan Beulich
  2012-09-14  6:23 ` [tip:x86/asm] x86: Prefer TZCNT over BFS tip-bot for Jan Beulich
  0 siblings, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2012-09-10 11:24 UTC (permalink / raw)
  To: mingo, tglx, hpa; +Cc: linux-kernel

Following a relatively recent compiler change, make use of the fact
that for non-zero input BSF and TZCNT produce the same result, and that
CPUs not knowing of TZCNT will treat the instruction as BSF (i.e.
ignore what looks like a REP prefix to them). The assumption here is
that TZCNT would never have worse performance than BSF.

For the moment, only do this when the respective generic-CPU option is
selected (as there are no specific-CPU options covering the CPUs
supporting TZCNT), and don't do that when size optimization was
requested.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

---
 arch/x86/include/asm/bitops.h |   19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

--- 3.6-rc5/arch/x86/include/asm/bitops.h
+++ 3.6-rc5-x86-bsf-tzcnt/arch/x86/include/asm/bitops.h
@@ -347,6 +347,19 @@ static int test_bit(int nr, const volati
 	 ? constant_test_bit((nr), (addr))	\
 	 : variable_test_bit((nr), (addr)))
 
+#if (defined(CONFIG_X86_GENERIC) || defined(CONFIG_GENERIC_CPU)) \
+    && !defined(CONFIG_CC_OPTIMIZE_FOR_SIZE)
+/*
+ * Since BSF and TZCNT have sufficiently similar semantics for the purposes
+ * for which we use them here, BMI-capable hardware will decode the prefixed
+ * variant as 'tzcnt ...' and may execute that faster than 'bsf ...', while
+ * older hardware will ignore the REP prefix and decode it as 'bsf ...'.
+ */
+# define BSF_PREFIX "rep;"
+#else
+# define BSF_PREFIX
+#endif
+
 /**
  * __ffs - find first set bit in word
  * @word: The word to search
@@ -355,7 +368,7 @@ static int test_bit(int nr, const volati
  */
 static inline unsigned long __ffs(unsigned long word)
 {
-	asm("bsf %1,%0"
+	asm(BSF_PREFIX "bsf %1,%0"
 		: "=r" (word)
 		: "rm" (word));
 	return word;
@@ -369,12 +382,14 @@ static inline unsigned long __ffs(unsign
  */
 static inline unsigned long ffz(unsigned long word)
 {
-	asm("bsf %1,%0"
+	asm(BSF_PREFIX "bsf %1,%0"
 		: "=r" (word)
 		: "r" (~word));
 	return word;
 }
 
+#undef BSF_PREFIX
+
 /*
  * __fls: find last set bit in word
  * @word: The word to search




^ permalink raw reply	[flat|nested] 11+ messages in thread

* [tip:x86/asm] x86: Prefer TZCNT over BFS
  2012-09-10 11:24 [PATCH] x86: prefer tzcnt over bsf Jan Beulich
@ 2012-09-14  6:23 ` tip-bot for Jan Beulich
  2012-09-14 21:14   ` Linus Torvalds
  0 siblings, 1 reply; 11+ messages in thread
From: tip-bot for Jan Beulich @ 2012-09-14  6:23 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, jbeulich, JBeulich, tglx

Commit-ID:  5870661c091e827973674cc3469b50c959008c2b
Gitweb:     http://git.kernel.org/tip/5870661c091e827973674cc3469b50c959008c2b
Author:     Jan Beulich <JBeulich@suse.com>
AuthorDate: Mon, 10 Sep 2012 12:24:43 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 13 Sep 2012 17:44:01 +0200

x86: Prefer TZCNT over BFS

Following a relatively recent compiler change, make use of the
fact that for non-zero input BSF and TZCNT produce the same
result, and that CPUs not knowing of TZCNT will treat the
instruction as BSF (i.e. ignore what looks like a REP prefix to
them). The assumption here is that TZCNT would never have worse
performance than BSF.

For the moment, only do this when the respective generic-CPU
option is selected (as there are no specific-CPU options
covering the CPUs supporting TZCNT), and don't do that when size
optimization was requested.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/504DEA1B020000780009A277@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/include/asm/bitops.h |   19 +++++++++++++++++--
 1 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h
index ebaee69..b2af664 100644
--- a/arch/x86/include/asm/bitops.h
+++ b/arch/x86/include/asm/bitops.h
@@ -347,6 +347,19 @@ static int test_bit(int nr, const volatile unsigned long *addr);
 	 ? constant_test_bit((nr), (addr))	\
 	 : variable_test_bit((nr), (addr)))
 
+#if (defined(CONFIG_X86_GENERIC) || defined(CONFIG_GENERIC_CPU)) \
+    && !defined(CONFIG_CC_OPTIMIZE_FOR_SIZE)
+/*
+ * Since BSF and TZCNT have sufficiently similar semantics for the purposes
+ * for which we use them here, BMI-capable hardware will decode the prefixed
+ * variant as 'tzcnt ...' and may execute that faster than 'bsf ...', while
+ * older hardware will ignore the REP prefix and decode it as 'bsf ...'.
+ */
+# define BSF_PREFIX "rep;"
+#else
+# define BSF_PREFIX
+#endif
+
 /**
  * __ffs - find first set bit in word
  * @word: The word to search
@@ -355,7 +368,7 @@ static int test_bit(int nr, const volatile unsigned long *addr);
  */
 static inline unsigned long __ffs(unsigned long word)
 {
-	asm("bsf %1,%0"
+	asm(BSF_PREFIX "bsf %1,%0"
 		: "=r" (word)
 		: "rm" (word));
 	return word;
@@ -369,12 +382,14 @@ static inline unsigned long __ffs(unsigned long word)
  */
 static inline unsigned long ffz(unsigned long word)
 {
-	asm("bsf %1,%0"
+	asm(BSF_PREFIX "bsf %1,%0"
 		: "=r" (word)
 		: "r" (~word));
 	return word;
 }
 
+#undef BSF_PREFIX
+
 /*
  * __fls: find last set bit in word
  * @word: The word to search

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [tip:x86/asm] x86: Prefer TZCNT over BFS
  2012-09-14  6:23 ` [tip:x86/asm] x86: Prefer TZCNT over BFS tip-bot for Jan Beulich
@ 2012-09-14 21:14   ` Linus Torvalds
  2012-09-14 21:32     ` Borislav Petkov
  2012-09-17  7:41     ` Jan Beulich
  0 siblings, 2 replies; 11+ messages in thread
From: Linus Torvalds @ 2012-09-14 21:14 UTC (permalink / raw)
  To: mingo, hpa, linux-kernel, torvalds, JBeulich, tglx; +Cc: linux-tip-commits

On Thu, Sep 13, 2012 at 11:23 PM, tip-bot for Jan Beulich
<JBeulich@suse.com> wrote:
>
> x86: Prefer TZCNT over BFS

This patch is insane.

> For the moment, only do this when the respective generic-CPU
> option is selected (as there are no specific-CPU options
> covering the CPUs supporting TZCNT), and don't do that when size
> optimization was requested.

This is pure garbage.

Anybody who thinks this:

> +#if (defined(CONFIG_X86_GENERIC) || defined(CONFIG_GENERIC_CPU)) \
> +    && !defined(CONFIG_CC_OPTIMIZE_FOR_SIZE)

is a good idea should be shot. Don't do it.

Introduce a new CONFIG variable with a sane name, for chrissake, the
same way we have CONFIG_X86_XADD etc. It would be logical to call it
X86_TZCNT, wouldn't it?

And then add sane rules for that in the x86 config file. And no, the
above is *NOT* a sane rule at all. If I read that right, it will
enable TZCNT even for old 32-bit CPU's, for example. That's just
f*cking insane.

Stop this kind of idiocy. The code looks bad, and the logic is pure shit too.

                Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [tip:x86/asm] x86: Prefer TZCNT over BFS
  2012-09-14 21:14   ` Linus Torvalds
@ 2012-09-14 21:32     ` Borislav Petkov
  2012-09-17  7:41     ` Jan Beulich
  1 sibling, 0 replies; 11+ messages in thread
From: Borislav Petkov @ 2012-09-14 21:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: mingo, hpa, linux-kernel, JBeulich, tglx, linux-tip-commits

On Fri, Sep 14, 2012 at 02:14:57PM -0700, Linus Torvalds wrote:
> On Thu, Sep 13, 2012 at 11:23 PM, tip-bot for Jan Beulich
> <JBeulich@suse.com> wrote:
> >
> > x86: Prefer TZCNT over BFS
> 
> This patch is insane.
> 
> > For the moment, only do this when the respective generic-CPU
> > option is selected (as there are no specific-CPU options
> > covering the CPUs supporting TZCNT), and don't do that when size
> > optimization was requested.
> 
> This is pure garbage.
> 
> Anybody who thinks this:
> 
> > +#if (defined(CONFIG_X86_GENERIC) || defined(CONFIG_GENERIC_CPU)) \
> > +    && !defined(CONFIG_CC_OPTIMIZE_FOR_SIZE)
> 
> is a good idea should be shot. Don't do it.
> 
> Introduce a new CONFIG variable with a sane name, for chrissake, the
> same way we have CONFIG_X86_XADD etc. It would be logical to call it
> X86_TZCNT, wouldn't it?
> 
> And then add sane rules for that in the x86 config file. And no, the
> above is *NOT* a sane rule at all. If I read that right, it will
> enable TZCNT even for old 32-bit CPU's, for example. That's just
> f*cking insane.
> 
> Stop this kind of idiocy. The code looks bad, and the logic is pure shit too.

And the other important question is, is this even worth the complexity?
I mean "may execute that faster than 'bsf ...'" doesn't mean a lot so
can anyone remind me again why we're doing this?

Any hot paths I've missed?

Thanks.

-- 
Regards/Gruss,
    Boris.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [tip:x86/asm] x86: Prefer TZCNT over BFS
  2012-09-14 21:14   ` Linus Torvalds
  2012-09-14 21:32     ` Borislav Petkov
@ 2012-09-17  7:41     ` Jan Beulich
  2012-09-17 10:00       ` Ingo Molnar
  1 sibling, 1 reply; 11+ messages in thread
From: Jan Beulich @ 2012-09-17  7:41 UTC (permalink / raw)
  To: mingo; +Cc: tglx, torvalds, linux-kernel, hpa

>>> On 14.09.12 at 23:14, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> On Thu, Sep 13, 2012 at 11:23 PM, tip-bot for Jan Beulich
> <JBeulich@suse.com> wrote:
>>
>> x86: Prefer TZCNT over BFS
> 
> This patch is insane.

I'm not going to defend this simple and strait forward a patch to
such a reply, so Ingo, just drop it to please Linus, even though I
can't see what's _this_ bad about it.

Jan


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [tip:x86/asm] x86: Prefer TZCNT over BFS
  2012-09-17  7:41     ` Jan Beulich
@ 2012-09-17 10:00       ` Ingo Molnar
  2012-09-17 10:58         ` Jan Beulich
  2012-09-17 17:00         ` Linus Torvalds
  0 siblings, 2 replies; 11+ messages in thread
From: Ingo Molnar @ 2012-09-17 10:00 UTC (permalink / raw)
  To: Jan Beulich; +Cc: tglx, torvalds, linux-kernel, hpa


* Jan Beulich <JBeulich@suse.com> wrote:

> >>> On 14.09.12 at 23:14, Linus Torvalds <torvalds@linux-foundation.org> wrote:
> > On Thu, Sep 13, 2012 at 11:23 PM, tip-bot for Jan Beulich
> > <JBeulich@suse.com> wrote:
> >>
> >> x86: Prefer TZCNT over BFS
> > 
> > This patch is insane.
> 
> I'm not going to defend this simple and strait forward a patch 
> to such a reply, so Ingo, just drop it to please Linus, even 
> though I can't see what's _this_ bad about it.

Well, the problem is the complication of the dependencies, which 
obscures the logic and makes it hard to maintain going forward.

Linus's suggestion is to introduce CONFIG_X86_TZCNT as an add-on 
patch that cleans up and documents what this is all about.

Would you be willing to do such a patch?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [tip:x86/asm] x86: Prefer TZCNT over BFS
  2012-09-17 10:00       ` Ingo Molnar
@ 2012-09-17 10:58         ` Jan Beulich
  2012-09-17 17:00         ` Linus Torvalds
  1 sibling, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2012-09-17 10:58 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: tglx, torvalds, linux-kernel, hpa

>>> On 17.09.12 at 12:00, Ingo Molnar <mingo@kernel.org> wrote:
> Linus's suggestion is to introduce CONFIG_X86_TZCNT as an add-on 
> patch that cleans up and documents what this is all about.

It wasn't just this he complained about. Plus I don't think the
suggested name fits the current scheme - it would imply the
instruction to be available when set, yet that's explicitly _not_
a requirement. So X86_USE_TZCNT would seem a better fit
to me.

> Would you be willing to do such a patch?

If that was the only change needed, yes (hesitantly, as I
personally don't think introducing Kconfig options that have
only a single usage point is really useful). But the complaint
regarding this also getting enabled for old CPUs (which in
my understanding is the purpose of the GENERIC options)
is something that I wouldn't be willing to address (unless I'm
told these two options have another purpose), and thus I
would expect the resulting patch to still get vetoed by Linus
(which in essence means it would be wasted time to try to
address the other complaint).

Jan


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [tip:x86/asm] x86: Prefer TZCNT over BFS
  2012-09-17 10:00       ` Ingo Molnar
  2012-09-17 10:58         ` Jan Beulich
@ 2012-09-17 17:00         ` Linus Torvalds
  2012-09-17 17:05           ` H. Peter Anvin
  1 sibling, 1 reply; 11+ messages in thread
From: Linus Torvalds @ 2012-09-17 17:00 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Jan Beulich, tglx, linux-kernel, hpa

On Mon, Sep 17, 2012 at 3:00 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> Linus's suggestion is to introduce CONFIG_X86_TZCNT as an add-on
> patch that cleans up and documents what this is all about.
>
> Would you be willing to do such a patch?

As Jan already pointed out, I really want more than that.

The *conditions* for selecting X86_TZCNT have to be sane too.

I really think that "32-bit generic kernel" is totally and completely
a wrong choice for enabling this. A 32-bit setup has exactly two
relevant cases:

 - old CPU's that don't support TZCNT anyway
 - new CPU's where the user doesn't care about performance (things
like HIGHMEM will have killed it much more than TZCNT etc ever will)

and in neither case is the TZCNT instruction relevant.

Even for the 64-bit case, I really don't see why "generic" should pick
it up either. The fact that we don't have many CPU optimization
choices for x86-64 is not an excuse to then overload "generic" with
that kind of choice, I think.

So I really objected not only to the ugly and unreadable conditionals,
I objected to the criteria for them too.

            Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [tip:x86/asm] x86: Prefer TZCNT over BFS
  2012-09-17 17:00         ` Linus Torvalds
@ 2012-09-17 17:05           ` H. Peter Anvin
  2012-09-17 17:13             ` Linus Torvalds
  2012-09-18  6:40             ` Jan Beulich
  0 siblings, 2 replies; 11+ messages in thread
From: H. Peter Anvin @ 2012-09-17 17:05 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ingo Molnar, Jan Beulich, tglx, linux-kernel

On 09/17/2012 10:00 AM, Linus Torvalds wrote:
\>
> As Jan already pointed out, I really want more than that.
> 
> The *conditions* for selecting X86_TZCNT have to be sane too.
> 
> I really think that "32-bit generic kernel" is totally and completely
> a wrong choice for enabling this. A 32-bit setup has exactly two
> relevant cases:
> 
>  - old CPU's that don't support TZCNT anyway
>  - new CPU's where the user doesn't care about performance (things
> like HIGHMEM will have killed it much more than TZCNT etc ever will)
> 
> and in neither case is the TZCNT instruction relevant.
> 
> Even for the 64-bit case, I really don't see why "generic" should pick
> it up either. The fact that we don't have many CPU optimization
> choices for x86-64 is not an excuse to then overload "generic" with
> that kind of choice, I think.
> 
> So I really objected not only to the ugly and unreadable conditionals,
> I objected to the criteria for them too.
> 

Honestly, I don't see a reason to not do this unconditionally.  On
anything but (pre-686-era) extremely old CPUs the cost of the extra
prefix is zero.  On the really old CPUs the cost of the BSF instruction
will dwarf the penalty cycle for the extra prefix, so the whole
mechanism around doing it conditionally seems pointless.

	-hpa



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [tip:x86/asm] x86: Prefer TZCNT over BFS
  2012-09-17 17:05           ` H. Peter Anvin
@ 2012-09-17 17:13             ` Linus Torvalds
  2012-09-18  6:40             ` Jan Beulich
  1 sibling, 0 replies; 11+ messages in thread
From: Linus Torvalds @ 2012-09-17 17:13 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Ingo Molnar, Jan Beulich, tglx, linux-kernel

On Mon, Sep 17, 2012 at 10:05 AM, H. Peter Anvin <hpa@zytor.com> wrote:
>
> Honestly, I don't see a reason to not do this unconditionally.

Possibly. "unconditional" is certainly a lot saner than "conditionally
on totally insane things".

                 Linus

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [tip:x86/asm] x86: Prefer TZCNT over BFS
  2012-09-17 17:05           ` H. Peter Anvin
  2012-09-17 17:13             ` Linus Torvalds
@ 2012-09-18  6:40             ` Jan Beulich
  1 sibling, 0 replies; 11+ messages in thread
From: Jan Beulich @ 2012-09-18  6:40 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Ingo Molnar, tglx, Linus Torvalds, linux-kernel

>>> On 17.09.12 at 19:05, "H. Peter Anvin" <hpa@zytor.com> wrote:
> Honestly, I don't see a reason to not do this unconditionally.  On
> anything but (pre-686-era) extremely old CPUs the cost of the extra
> prefix is zero.  On the really old CPUs the cost of the BSF instruction
> will dwarf the penalty cycle for the extra prefix, so the whole
> mechanism around doing it conditionally seems pointless.

I certainly don't mind re-submitting with the whole thing made
unconditional (implying that the prior version would be dropped
in any case).

Jan


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-09-18  6:40 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-09-10 11:24 [PATCH] x86: prefer tzcnt over bsf Jan Beulich
2012-09-14  6:23 ` [tip:x86/asm] x86: Prefer TZCNT over BFS tip-bot for Jan Beulich
2012-09-14 21:14   ` Linus Torvalds
2012-09-14 21:32     ` Borislav Petkov
2012-09-17  7:41     ` Jan Beulich
2012-09-17 10:00       ` Ingo Molnar
2012-09-17 10:58         ` Jan Beulich
2012-09-17 17:00         ` Linus Torvalds
2012-09-17 17:05           ` H. Peter Anvin
2012-09-17 17:13             ` Linus Torvalds
2012-09-18  6:40             ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).